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Two  notable  advances  in  numerical  methods  were  supported  by  this  grant. 

First,  a  fast  algorithm  was  derived,  analyzed,  and  tested  for  the  Fourier  transform  of  pi 
ecewise  polynomials  given  on  d-dimensional  simplices  in  D-dimensional  Euclidean  space.  T 
hese  transforms  play  a  key  role  in  computational  problems  ranging  from  medical  imaging  to 
partial  differential  equations,  and  existing  algorithms  are  inaccurate  and/or  prohibitiv 
ely  slow  for  d  >  0.  The  algorithm  employs  low-rank  approximation  by  Taylor  series  organi 
zed  in  a  butterfly  scheme,  with  moments  evaluated  by  a  new  dimensional  recurrence  and  sim 
plex  quadrature  rules.  For  moderate  accuracy  and  problem  size  it  runs  orders  of  magnitud 
e  faster  than  direct  evaluation,  and  one  to  three  orders  of  magnitude  slower  than  the  cla 
ssical  uniform  Fast  Fourier  Transform. 

Second,  bilinear  quadratures  - which  numerically  evaluate  continuous  bilinear  maps,  such 

as  the  L2  inner  product,  on  continuous  f  and  g  belonging  to  known  finite-dimensional  fun 

ction  spaces - were  analyzed  and  developed.  Such  maps  arise  in  Galerkin  methods  for  diff 

erential  and  integral  equations.  Bilinear  quadratures  were  constructed  over  arbitrary  D- 
dimensional  domains.  In  one  dimension,  integration  rules  of  this  type  include  Gaussian  q 
uadrature  for  polynomials  and  the  trapezoidal  rule  for  trigonometric  polynomials.  A  numer 
ical  procedure  for  constructing  bilinear  quadratures  was  developed  and  validated. 
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1  Introduction 


In  this  paper,  we  present  a  new  algorithm  for  the  fast  evaluation  of  the  exact 
Fourier  transform 


1  <  k  <  N, 


(1) 


of  piecewise-polynomial  densities  fj(s)  defined  on  simplices  Sj  C  RD,  at  ar¬ 
bitrary  points  tk  G  RD.  Here  i  =  y  — 1,  the  ambient  dimension  D  >  1,  the 
simplex  dimension  d  satisfies  D  >  d  >  0,  and  each  S3  is  a  d- dimensional 
simplex  of  the  form 

d  d 

S  =  {s  =  v0  +  Y^  0i(,Vi  -  v0 )  |  di  >  0,  0i  <  1})  (2) 

%—  1  i=  1 


consisting  of  all  convex  combinations  of  d  +  1  given  vertices  vt  G  RD.  A  wide 
variety  of  useful  special  cases  occur  frequently  in  applications. 

When  the  simplex  dimension  d  —  0,  each  simplex  Sj  is  a  point  Sj,  and  each 
fj  is  a  constant.  Transform  (1)  becomes  the  pointwise  nonuniform  Fourier 
transform 

N 

f(h)  =  exP(i  tksj)fj,  l<k<N,  (3) 

3= 1 


where  Sj  G  RD  and  A  G  RD  are  given  D-vectors  and  fj  is  a  complex  number. 
The  special  case  where  s3  and  tk  form  regular  cubical  grids  corresponds  to 
the  classical  uniform  Fast  Fourier  Transform  [1,2].  Several  classes  of  fast  algo¬ 
rithms  for  pointwise  nonuniform  transforms  such  as  (3)  are  now  well-known 
[3-8],  but  fast  algorithms  for  the  more  general  transform  (1)  are  less  developed 
[9-13]. 


When  the  simplex  dimension  d—  1,  each  simplex  Sj  is  a  line  segment  [u]}  Vj] 
connecting  endpoints  Uj,  Vj  G  RD,  and  parametrized  by  s  =  Uj  +  a(vj  —  Uj ) 
for  0  <  cr  <  1.  Transform  (1)  becomes 


N  1 

f(tk)  =  II vi  ~  “ill  / exP(i tl(uj  +  a(vj  -  Uj)))fj(a )  da, 
i=1  n 


1  <  k  <  N. 


When  d  —  2,  each  simplex  Sj  is  a  triangle  A (uj,Vj,Wj)  with  vertices  Uj,  Vj , 
Wj  G  RD,  parametrized  by  s  =  Uj  +  a(v3  —  Uj)  +  9(wj  —  Uj)  where  0  <  9  <  1, 


tnbi 


DISTRIBUTION  A:  Distribution  approved  for  public  release. 


0<(T<1,  cr  +  6<l.  Transform  (1)  becomes 

N 

f(tk )  =  y/\\v3  -  uj\\2\\wj  -  “ill2  -  ((v3  -  uj)T(wj  -  uj ))2 

3= i 

1  l-CT 

•  J  j  exp(i t£(uj  +  cr(i)j  —  Uj )  +  0(wj  —  Uj)))fj(a ,  0)  cl#  da,  1  <  k  <  N. 
o  o 

For  simplex  dimension  d  >  1,  the  transforms  (1)  arise  in  the  discretization 
of  geometric  objects  by  line  segments,  triangles  and  tetrahedra.  For  example, 

Fig.  1  displays  L-level  approximate  Sierpinski  paths  composed  of  line  segments 
Sj  with  simplex  dimension  d  —  1  in  ambient  dimension  D  =  2,  together  with 
the  Fourier  transforms  of  unit  data  /  =  1  on  these  paths.  As  L  increases,  the 
Sierpinski  paths  tend  to  fill  a  triangle  T  and  the  Fourier  transform  /  tends  to 
the  Fourier  transform  of  the  characteristic  function  of  T. 

Our  algorithm  evaluates  transform  (1)  in  0(iV  log  At  log  e)  work  to  accuracy 
e,  for  arbitrary  d  and  D.  It  is  globally  structured  as  in  the  butterfly  algorithm 
of  [14],  with  local  transformations  based  on  multidimensional  Taylor  series. 

Thus  it  groups  source  simplices  Sj  and  target  points  A  into  hierarchical  tree 
structures,  approximates  the  kernel  exp(ifTs)  by  low-rank  expansions,  and 
transforms  the  low-rank  expansions  from  source-local  to  target-local  form. 

The  main  new  ingredient  is  a  stable  efficient  dimensional  recurrence  for  local 
source  moments  of  polynomials  over  simplices.  Our  derivation,  analysis  and 
implementation  all  operate  with  arbitrary  ambient  dimension  D  >  1,  arbitrary 
simplex  dimension  0  <  d  <  D,  and  arbitrary  polynomial  degree  deg (fj)  =  p  > 

0.  Despite  this  generality,  our  implementation  runs  orders  of  magnitude  faster 
than  direct  evaluation  and  even  compares  favorably  with  the  classical  uniform 
Fast  Fourier  Transform. 

The  paper  is  organized  as  follows.  In  Section  2,  we  review  mathematical  tools 
such  as  error  bounds  for  Taylor  expansion  of  exponential  functions,  and  spatial 
tree  structures  for  localizing  target  points.  In  Section  3,  we  combine  these 
tools  to  derive  a  pointwise  (d  =  0)  butterfly  algorithm  similar  to  [14].  In 
Section  4,  we  generalize  the  butterfly  algorithm  to  piecewise  polynomials  fj 
on  simplices  of  dimension  d  >  1.  The  main  new  tool  is  a  stable  efficient 
dimensional  recurrence  which  evaluates  exponential-polynomial  moments  in 
dimension  d  >  1  by  a  combination  of  simplex  quadrature  and  recurrence 
to  simplex  dimension  d  —  1.  In  Section  5  we  present  numerical  experiments 
which  verify  the  efficiency  and  accuracy  of  the  approach.  In  Section  6  we 
discuss  potential  speedups  with  optimized  basis  functions  [15],  the  evaluation 
of  Fourier- Galer kin  matrix  elements  [16],  and  extensions  to  Laplace  and  Gauss 
transforms  [17-19]. 


tri 
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(a)  5-level  path 


(c)  10-level  path 


(b)  Fourier  transform 


(d)  Fourier  transform 


Fig.  1.  Sierpinski  paths  and  Fourier  transforms 
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2  Mathematical  preliminaries 


We  review  classical  tools  for  low-rank  kernel  approximation  and  hierarchical 
point  clustering.  The  exponential  kernel  of  transform  (1)  is  approximated  by 
Taylor  expansion.  A  simple  error  estimate  delineates  the  region  where  Taylor 
expansion  is  accurate.  Tree  structures  are  developed  to  organize  source  and 
target  points  into  groups  where  Taylor  expansion  is  accurate.  The  classical 
pointwise  butterfly  algorithm  combines  these  tools,  while  our  new  piecewise- 
polynomial  butterfly  algorithm  involves  additional  ingredients. 


2.1  Low-rank  kernel  approximation 


The  multidimensional  Taylor  expansion 

OO 

exp(i  tTs)  =  Y, 


oo  y 

i= 0 


(x^-=i  tjsj 


n\ 


;M 

=  £ 


a>0 


a\ 


(4) 


of  the  complex  exponential  kernel  follows  immediately  from  the  multinomial 
theorem 


ta 

—  n\  y  — . 

\aj=n  al 


(5) 


Here 

t  and  s  are  real  or  complex  .D-vectors, 

tTs  =  tiSi  +  •  •  •  +  tnSD  is  their  inner  product, 

a  =  (ai, . . . ,  an)  G  ND  is  a  D- dimensional  multiindex  of  order 

|a|  —  c^i  +  •  •  •  +  Old, 

ta  —  ti1  ■  ■  ■  t^f  and  s“  are  monomials,  and 
a\  =  Qri ! c^2 !  ■  ■  ■  Qf ! . 

According  to  Stirling’s  inequality  ml  >  (■ m/e)m ,  terminating  expansion  (4) 
after  M  =  terms  of  order  \a\  <m  incurs  truncation  error  Em  bounded 

by 
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as  long  as  |fTs|  <  R  where  R/m  <  0.27  or  equivalently  m  >  3.8 R.  For 
example,  with  R  =  1  accuracy  e  =  10”6  is  guaranteed  with  m  —  11  and 
accuracy  e  =  10~12  is  guaranteed  with  m  =  16.  In  general,  the  number  of 
terms  M  increases  polylogarithmically  as  e  decreases. 

The  utility  of  expansion  (6)  can  be  easily  demonstrated  with  unrealistically 
distributed  sets  of  targets  and  sources.  Suppose  all  the  sources  Sj  and  targets 
tk  in  the  pointwise  transform  (3)  are  clustered  near  0  so  that  \tkSj\  <  R ,  where 
R  and  m  are  chosen  to  guarantee  error  \Em\  <  e.  Then  expansion  (4)  speeds 
up  the  pointwise  transform  (3)  as  follows: 


N 

/(4)  =  5Zexp(ifJsi)/j  (7) 

3= 1 

N  -H 

=EE  - t  +  Fm 

/-  a|<m 

—  E  +  Fm. 

\a\<m 


Here  the  coefficients  Ca  dehned  by 


a 


iM 


a\ 


N 


£ 


encode  the  sources  Sj  and  strengths  fj  into  M  moments,  and  \Fm\  <  eJ2\fj\ 
bounds  the  error.  All  M  moments  up  to  order  m  can  be  evaluated  in  O(MN) 
work,  while  the  resulting  polynomial  approximation  to  f(t)  can  be  evaluated 
at  N  targets  A  in  0(MN )  work.  Thus  the  rank -M  kernel  approximation 


exp  it1  s  = 


E 

\a\<m 


jM 

+  E» 
a\ 


(8) 


gives  an  O(N)  algorithm  with  a  constant  factor  O(M)  depending  polyloga¬ 
rithmically  on  the  accuracy  e. 

The  sources  Sj  and  targets  tk  are  not  conveniently  clustered  in  most  appli¬ 
cations.  Indeed,  in  the  classical  one- dimensional  Fast  Fourier  Transform  [1,2], 
Sj  =  27 xj  and  tk  =  k/N  for  1  <  j,  k  <  N,  so  0  <  t^Sj  <  2nN  — y  oo  as  — y  oo. 
Thus  we  employ  a  collection  of  expansions  (6)  centered  about  arbitrary  target 
centers  r  and  source  centers  er.  Each  expansion  represents  the  transform  due 
to  sources  Sj  in  a  cubical  cell  S  centered  at  s  =  a,  evaluated  at  targets  tk  in 
a  cell  T  centered  at  t  —  r: 


4, 
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exP(i tksj)fj  =  exp(i ttct  +  irT(sj  -  a)  +  i (4  -  r)T(si  -  a)  +  i(4  -  T)Ta)fj 


•(4  -  r)Q  exp(i(4  -  r)To-)  +  Fm 
=  ^(c,  r)(4  -  r)“  exp(i(4  -  t)t a)  +  Fm 

|a|<m 


where 

Ca{(T,  r)  =  exp(irTa)^— j-  ^  {sj  -  cr)aexp(i TT(Sj  -  a))fj  (9) 

a‘  *jes 

and  \Fm\  <  eJ2\fj\  if 

1(4  -  t)t(s7-  -  cr)\  <  R  where  ( — )  <e  (10) 

1  V  m  J 

for  all  tk  E  T  and  s3  G  S.  Thus  the  transform  due  to  sources  in  S,  evaluated 
at  targets  in  T,  is  approximated  with  a  kernel 

exp(irTo-)  (sf  ^  a)a  exp(irT(sj  -  a))  —  (4  -  r)a  exp(i(4  -  r)Ta)(ll) 

\a\<m 


of  rank  M  =  whenever  inequalities  (10)  hold.  The  accuracy  of  this 

approximation  is  controlled  by  the  geometry  of  S  and  T,  rather  than  the 
separation  required  by  classical  multipole  methods  [20,21],  In  this  paper,  we 
control  accuracy  by  controlling  the  relative  sizes  of  S  and  T,  but  it  is  also 
possible  to  divide  the  sources  and  targets  into  groups  with  selected  orientations 
9  in  tTs  =  ||t||||s||  cos  9. 

Butterfly  algorithms  apply  low-rank  approximate  kernels  (11)  to  spatial  clus¬ 
ters  of  sources  and  targets,  via  a  pair  of  translation  lemmas  that  shift  the 
coefficients  Ca(cr,  r)  of  (9)  to  smaller  target  cells  and  larger  source  cells.  These 
lemmas  are  proved  by  expanding  the  exponential  in  Taylor  series  and  applying 
the  binomial  theorem  respectively: 

Lemma  1  Given  sources  Sj  G  S  and  targets  4  £  T ,  the  coefficient  vector 
C(a,Ti)  G  CM  relative  to  a  smaller  target  cell  Tf  C  T  with  center  T\  is 
approximated  by  an  upper  triangular  matrix  multiply 

Ca{cr,  Ti)  =  exp(i(n  -  r)Ta)  ^  +  (ti  -  rfCp+a(a,  r) 

p> o  V  P  J 
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(12) 


=  Y1  Rap(<I,T  ~  Tl)Cf(lJ,T)  +  E 

a</3,  |/3|<m 


where 


Rap(cr,  T) 


exp(i  (rTa)) 


T 


(3— a 


for  (3  >a. 


If  \a\  <  m,  the  error  E  incurred  by  truncating  formula  (12)  after  terms  of 
order  \/3\  <  m  is  bounded  by  p~mejf  \fj\  whenever  |(ti  —  r)T(sj  —  cr)  |  <  Rj  p. 

The  error  estimate  for  truncating  the  infinite  series  follows  immediately  from 
inequality  (10)  since  7\  C  T. 

Lemma  2  Given  coefficients  C(ai,  r)  for  source  cell  S  and  target  cell  T\ ,  the 
Taylor  coefficients  Ca(a,r)  relative  to  a  larger  source  cell  S  D  Si  with  center 
o  are  given  by  an  exact  lower  triangular  matrix  multiply 


Ca(a,r)  =exp(irT(ai  -  a))  ^  -  cr)a-^Cp(cr1}  r) 

=  J2Lap(cr1-a,T)Cp(a1,T).  (13) 

/3<0' 


where 


j|a-/3| 

LQp(v,T)  =  exp(i  rTa)- - 

(a  -  (d)\ 


for  a  <  (3. 

After  using  Lemma  2  to  shift  coefficient  vectors  from  several  different  source 
subcells  S\,  Sn  to  a  common  superset  S  with  center  a,  the  resulting 
expansions  can  be  merged  into  a  single  expansion  by  adding  the  coefficients: 

n 

Cp(cr,  =  Lap(vj  -  cr,  r)C'/3(aj,  r). 

j= 1 P<a 


In  the  butterfiy  algorithm,  the  matrices  L(a  —  crj,  t-()  and  f?(oy,  r  —  r* )  operate 
on  2d  expansions  C(<Jj,T )  representing  the  combined  effect  of  sources  in  the 
2d  children  Sj  of  a  source  cell  S,  evaluated  at  targets  in  a  target  cell  T. 
The  result  is  another  set  of  2 D  expansions  C(cr,  rf)  representing  sources  in 
the  parent  source  cell  S  to  targets  in  each  of  the  2 D  children  Tt  of  T.  The 
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combined  operation  reduces  source  locality  and  increases  target  locality  by  a 
block  matrix-vector  multiply  with  2 D  M  x  M  blocks: 


2 D 

C{<7,  Ti )  =  L(aj  ~  Ti)R(<Jj,  T  -  Ti)C(<Jj,  t). 

3  =  1 


(14) 


The  butterfly  algorithm  presented  in  this  paper  factorizes  the  transform  (1)  by 
repeated  application  of  Eq.  (14)  to  moments  formed  with  groups  of  simplices 
and  densities.  The  factorization  process  is  abstracted  and  generalized  in  [22], 


2.2  Hierarchical  point  clustering 


The  expansion  shifting  and  merging  lemmas  of  subsection  2.1  operate  within 
a  data  structure  which  organizes  source  and  target  objects  S3  and  tk  into  cells 
S  and  T  so  that  inequality  (10)  is  locally  satisfied.  Collections  of  geometric 
objects  such  as  the  source  simplices  and  target  points  in  Eq.  (1)  can  be  effi¬ 
ciently  organized  into  local  groups  by  a  variety  of  tree-based  data  structures 
[23].  We  employ  a  2D-ary  tree  which  generalizes  quadtrees  and  octrees  to  ar¬ 
bitrary  space  dimension  D.  In  the  present  pointwise  case  where  d  —  0,  its 
construction  is  straightforward. 

Suppose  N  points  s3  e  RD  arc  all  contained  in  a  hypercubical  unit  cell  ,S'0 ; o  = 
[a  i  —  R,  G\  +  R]  x. .  .x  [<td'-R,<jd  +  R]  =:  <Jo,o  +  RQ  with  center  <70,o  and  width 
2 R.  Here  Q  denotes  the  unit  cell  [— 1, 1}D  and  +  denotes  addition  of  sets.  We 
can  build  a  L-level  tree  structure  Sl  from  the  root  downward  as  follows.  Let 
the  root  cell  be  Sop,  comprising  the  root  level  0  of  S.  For  level  l  —  0  to  L  —  1, 
for  J  =  0  to  2Dl  —  1,  divide  each  cubical  cell  Sj  j  =  aitj  +  2~lRQ  into  2D  cubical 
children  Si+X  2dj+j  =  ^i+i,2d  j+j+2~l~l RQ  for  j  =  0  to  2D  —  1,  and  record  them 
on  level  /  +  1.  As  each  child  cell  is  constructed,  construct  pointers  from  the  cell 
to  points  Sk  located  inside  it  and  vice  versa.  The  resulting  structure  Sl  can 
be  used  to  answer  many  range-type  queries  such  as  finding  nearest  neighbors. 
Its  O(NL)  cost  is  rarely  optimal  for  any  specific  task,  but  its  flexibility  and 
efficiency  has  made  it  ubiquitous  in  computational  science.  Fig.  2  illustrates 
the  construction. 
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(a)  Points  (b)  Tree  (c)  Cells 


Fig.  2.  Points  in  two  dimensions,  a  6-level  tree,  and  nonempty  child  cells. 
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3  A  pointwise  butterfly  algorithm 


A  butterfly  algorithm  for  the  pointwise  transform  (3)  can  be  based  on  low-rank 
expansion  and  hierarchical  point  clustering  [14,15,24-26].  It  localizes  sources 
and  targets  into  tree  structures,  forms  source-local  expansion  coefficients, 
shifts  and  merges  them  systematically,  and  evaluates  target-local  expansions. 
We  derive  these  pointwise  algorithms  in  four  steps  before  generalizing  to  the 
piecewise-polynomial  transform  (1)  in  Section  4. 

First,  we  build  spatial  tree  structures  Sl  and  71  for  the  source  and  target 
points.  These  structures  cover  the  data  regions  S  and  T  by  superimposed  layers 
l  =  0  (the  root)  to  L  (the  finest  level).  The  number  L  +  1  of  levels  required 
is  determined  by  \S\\T\  where  |S'|  and  \T\  are  the  diameters  of  the  source  and 
target  point  collections  respectively.  Layer  l  consists  of  2DI  rectangular  cells 
Sij  or  T[  j  numbered  from  0  to  2Dl  —  1,  whose  union  partitions  the  points  Sk 
or  tk ■  The  error  in  an  M- term  Taylor  expansion  (4)  of  the  exponential  kernel, 
for  sources  located  in  a  level-/  cell  of  Sl  and  targets  located  in  a  level-  (L  —  l ) 
cell  of  71  will  be  bounded  by  eX)  |  fj\  for  all  /  if  the  inequalities 

/  Rp  \  m 

2~l\S\\T\<R  and  (  — )  <e  (15) 

V  m  J 


are  satisfied.  Typically  2DL  =  0(N )  so  that  each  level- L  cell  contains  on  av¬ 
erage  0(1)  points.  However,  uniform  distribution  of  the  points  is  not  assumed 
and  does  not  alter  the  cost  of  the  algorithm. 


Second,  we  create  source-local  coefficient  vectors  C(<JL,j,  Tgo)  which  represent 
the  effect  of  sources  Sk  in  each  leaf  cell  Slj  with  center  <jlj  on  the  finest  level 
L  of  the  source  tree  Sl,  on  all  targets  tk  in  the  root  cell  T0)o  with  center  Tyo 
on  level  0  of  the  target  tree  71-  Thus 


Ca{&L,j,  7o,o) 


exP(irc loaL,j 


a\ 


J2  (sk  ~  aL,j)a  exp(ir0T’0(sfc 

L  ,j 


&L,j))fk- 


The  total  cost  of  coefficient  evaluation  is  0(NM )  since  each  source  contributes 
to  M  coefficients  in  a  single  leaf  cell  per  source.  Once  these  coefficients  have 
been  evaluated,  the  field  at  each  target  point  tk  could  be  accurately  approxi¬ 
mated  by  summing  the  source-local  expansions: 

^  : 2DL- 1 

f{tk)=  J2  eXp(i(4  -  T0fl)T<JL,j)  J2  Ca{<7L,j,T0,o)(tk-T0fi)a. 
j= 0  |a|<m 


However,  the  0(N'2)  cost  of  this  computation  would  be  prohibitive  since  2DL  = 
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0(N).  An  efficient  evaluation  scheme  requires  target-local  coefficient  vectors 
C (cr 0  0;  t L,j)  which  represent  the  effect  of  all  sources  to  each  leaf  cell  of  the 
target  tree  72,  and  evaluates  one  expansion 

f(tk)  =  exp(i (tk  -  rLJ)Ta0fi)  Coc{o o,o,  rL,j)(tk  ~  rLd)at 

\a\<m 


at  each  target  tk  in  72  j.  The  total  cost  of  evaluating  these  expansions  is 
0(NM )  since  each  target  evaluation  involves  M  coefficients. 

The  third  step  of  the  butterfly  algorithm  computes  the  target-local  coefficient 
vectors  C(cro,o,  tlj)  which  approximate  the  total  held,  at  targets  tk  in  the  hnest 
level- L  target  cells  72  j  with  centers  tlj,  due  to  all  the  sources  sk  in  the  source 
root  cell  S0,0  with  center  a0,o-  The  algorithm  proceeds  by  doubling  source  cells 
and  halving  target  cells,  and  therefore  requires  L  =  O  (log  IV)  substeps.  The 
substep  from  source  level  /  and  target  level  L  —  l  to  source  parent  level  l  —  1 
and  target  child  level  L  —  l  +  1,  consists  of  two  operations. 

In  the  first  operation,  on  source  level  /  and  target  level  (L  —  /),  for  J  =  0  to 
2Z)(i-0_i,  each  target  cell  Tl_i  j  is  split  into  2D  children  TL_l+l  2DJ+j  C  Tl_ij 
with  j  =  0  to  2D  — 1.  Each  coefficient  vector  relative  to  the  parent 

center  is  converted  to  another  coefficient  vector 

C(<Jiti,  TL_i+lt2DJ+j)  =  R(<Jiti,TL-l,J  —  TL^i+i:2Dj+j)C(aiti,.TL-l,j) 


relative  to  each  child  center  tl_i+1j2dj+j,  using  the  M xM  matrices 
tl-1+i,2d  j+j)  defined  in  Eq.  (12). 

In  the  second  operation,  each  group  of  2 D  level-/  source  cell  siblings  Sij  C 
Si- ij  with  I  =  |_?'/2dJ,  is  merged  into  their  parent  Si- ij.  For  J  =  0  to 
2 d{l-i+i)  _  each  coefficient  vector  C(ai:i,TL_i+i}2DJ+j)  relative  to  source 
child  center  oy,;  is  converted  to  a  coefficient  vector 

Bl{<Ji-ijiTL-i+i,2D  j+j)  —  L(a^i  —  cri-i,i,rL-l+l^DJ+j)C(aiti,TL-i+itj) 


relative  to  the  source  parent  center  cri-ij,  using  the  M  x  M  matrices  L(oi ^  — 
cp-iy, TL-l+12DJ+j)  dehned  in  Eq.  (13).  Since  the  2D  resulting  coefficient  vec¬ 
tors  Bl  have  the  same  source  center  a  =  cq_ iy),  and  the  same  target  center 
T\  =  TL_l+12DJ+j ,  they  are  summed  coefficient  by  coefficient: 


2° 

C(a,T1)  =  J2B\(Ti,i,Ti) 


i=  1 


Disfrf 
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2° 

=  Y  L(<Jiji  -  a,  ti)R(<ji tL-;,j  -  Ti)C(aiti,  tL-U). 

i=  1 

After  these  two  substeps,  target  cells  have  been  halved  and  source  cells  have 
been  doubled,  so  the  error  control  estimate  (10)  is  preserved.  Arranging  the 
fan-out  matrices  R  and  fan-in  matrices  L  symmetrically  as  in  Fig.  3  illustrates 
the  butterfly  algorithm  [?,?,?]. 


Fig.  3.  Interactions  between  opposite  levels  of  the  source  and  target  trees  give  the 
butterfly  algorithm  its  name. 

After  L  steps  of  the  butterfly  scheme,  a  coefficient  vector  C(cro,o,  tlj)  has  been 
computed  for  each  target  cell  TTjJ  on  the  forest  level  L,  and  each  coefficients 
vector  represents  the  held  due  to  all  the  sources,  evaluated  at  any  4  e  TL  j: 

f(tk)  =  exp  (i (tk  -  TLJ)Ta0t 0)  Y  CUfoto,  tLj)  (4  -  rLJ)a  (16) 

|a|<m 


Hence  the  transform  can  be  accurately  evaluated  at  each  target  4  hr  O(MN) 
time,  where  M  depends  polylogarithnrically  on  the  accuracy  required.  A  de¬ 
tailed  algorithm  is  exhibited  in  Fig.  4. 
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Fig.  4.  A  pointwise  butterfly  algorithm 

Step  1  -  Localization 

Sort  sources  s3  and  targets  tk  into  leaf  cells  of  L-level  trees  Sk  and  Tl 

Step  2  -  Compute  source-centered  coefficients 

for  j  =  0  . . .  2dl  —  1 

S  =  SLJ 

° '  =  aL,j 
T  =  T0,  o 
for  |  a  |  <  m 

CQ(cr ,  t )  =  exp(irT(r)^  T,Skes(sk  ~  cr)a  exp(i rT(sk  -  a))fk 

end  for 
end  for 


Step  3  -  Butterfly 

for  l  —  L  . . .  1 

for  1  =  0...  2Dl  —  1 
^  =  07—i,/ 

for  J  =  0  . . .  2D(i-d  -  1 
r  = 

for  j  =  0  . . .  2d  —  1 

T1  —  TL-l+l,2DJ+j 

C{(T,  Tl)  =  ELo'1  L(°l  ,I/2D+i  —  O',  Ti)R(ai  j/2D+i,  T  —  Ti)C(<Jij/2D+i,  T ) 

end  for 
end  for 
end  for 
end  for 


Step  4  -  Evaluate  target-centered  expansions 

for  j  =  0  . . .  2dl  —  1 
o'  =  o-0,o 
T  =  Tlj 

T  =  TL,j 

for  tk  e  Tlj 

f(tk )  =  exp(i (4  -  T)Ta))  Y,\a\<mCa((T,T)(tk  ~  t)° 

end  for 
end  for 


Disfri^i 
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4  Piecewise-polynomial  distributions 


Given  N  polynomial  densities  fj  on  d- dimensional  simplices  S3  C  RD,  we 
derive  a  fast  algorithm  for  evaluating  the  Fourier  transform 


(17) 


at  N  points  tk  €  RD.  Our  algorithm  combines  the  low-rank  kernel  approxi¬ 
mation,  tree  structures  and  butterfly  scheme  of  Sections  2  and  3  with  a  new 
dimensional  recurrence  for  evaluating  exponential-polynomial  moments  (Sec¬ 
tion  4.1.1)  and  a  flexible  tree  structure  for  localizing  simplices  (Section  4.1.2). 


Suppose  a  given  simplex  S3  is  contained  in  a  cubical  cell  S  =  a  +  RsQ  and  tk 
lies  in  a  cubical  cell  T  =  r  +  RTQ  where  R  =  RsRt  satisfies 


Re\m 
—  <e. 

m  J 


(18) 


Then 


jlaj  . 

exp(i t^s)  =  exp(irTa)  ^  — r  (s  —  cr)“  exp(irT(s  -  a))  (4  -  r)a  exp(i (4  - 


\a\<m 


where  |  Ern \  <  e.  Hence  integrating  over  S3  gives 

J  exp(i tls)fj(s)  ds  =  ^  Ca(a,  t)  (4  -  t)°  exp(i(4  -  t)t a)  +  Fm  (19) 


|a|  <m 


where  \Fm\  <  e  f  \  fj\  and 


Ca(cr,  r)  =  exp(iTT cr)1—  J  (s  -  cr)a  exp(irT(s  -  a))f3(s)  ds 


iM 


=  exp(irru)— Fa(d,  S3,  f3). 

Here  the  M-vector  F  of  exponential-polynomial  moments  is  dehned  by 

Fa(d,S,f)  =  J  (s  —  cr)“  exp(irT(s  —  a)) f(s)ds  (20) 
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)T& )  +  Em 


for  a  polynomial  density  /  on  a  simplex  S.  The  source  center  a,  target  center 
r,  and  ambient  dimension  D  are  omitted  from  the  notation  for  simplicity.  In 
Section  4.1.1,  we  derive  efficient  methods  for  evaluating  F. 


4-1  Exponential-polynomial  moments 


Fourier  transforms  of  piecewise-polynomial  data  over  simplices  involve  exponential- 
polynomial  moments  defined  by 

Fa(d,  S,  f)  =  J  exp(i tTs)(s  -  a)af(s)  d s,  (21) 

s 


computed  for  all  multiindices  a  with  |a|  <  m  and  a  given  degree-p  polyno¬ 
mial  /  defined  on  the  d- dimensional  simplex  S.  The  target  point  t,  ambient 
dimension  D  and  cell  center  a  are  omitted  from  notation  for  simplicity. 

For  example,  direct  evaluation  of  transform  (1)  without  the  butterfly  algo¬ 
rithm  is  equivalent  to  the  summation  of  N  moments  F(d,  Sj,  fj,  0)  of  order 
\a\  =  0  for  each  point  of  evaluation  t *..  Expansion  (19)  requires  moments 
F(d,  Sj,  fj,a)  for  all  orders  up  to  \a\  =  m,  with  t  —  r  ranging  over  various 
target  cell  centers.  Similar  moments  formed  with  polynomial,  exponential  or 
bandlimited  bases  occur  in  other  butterfly  algorithms  for  pointwise  data  with 
d  —  0.  When  d  >  0,  no  algorithm  is  known  for  their  exact  evaluation,  and  the 
rapid  oscillation  of  the  exponential  kernel  renders  them  resistant  to  standard 
numerical  quadratures. 

We  evaluate  these  moments  by  a  recursively  branching  three-step  procedure. 
First,  we  extract  the  variation  of  the  exponential  perpendicular  to  S  (subsec¬ 
tion  4.1.1).  Second,  if  the  remaining  parallel  variation  is  small,  we  evaluate  the 
moment  directly  by  numerical  quadrature  (Section  4.1.2).  Third,  we  reduce 
the  simplex  dimension  d  by  dimensional  recurrence  (Section  4.1.3),  continuing 
recursively  until  d  =  0  or  the  remaining  parallel  variation  is  small. 


4-1.1  The  perpendicular  variation 

If  the  simplex  S  has  positive  dimension  and  codimension,  so  0  <  d  <  D, 
then  moments  (21)  are  simplified  by  extracting  the  part  of  the  target  vector  t 
perpendicular  to  the  d- dimensional  affine  hyperplane 

d 

H  =  {s  =  v0  +  J2  ^(l’i  ~  w)  I  &i  e  R} 

i= 1 
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containing  the  domain  of  integration 


d 


d 


S  =  {no  +  ~  l’o)  I  Oi  >  0,  -  X}- 

1=1  1=1 


(22) 


Define  the  D  x  d  full-rank  matrix  V  to  have  columns  vt  —  v(j  E  for  %  —  1  to 
d,  so  that  the  simplex  S  is  parametrized  by  s  =  vq  +  V6  where  9  varies  over 
the  standard  d- dimensional  simplex 

d 

'S'o  =  {(#1,  •  •  • ,  9d)  |  Oi  >  0,  Oi  <  !}• 

2—1 


Then 


H  =  {s  =  Vo  +  Ve  \6e  Rd} 


and  the  moments  (21)  are  given  by 


Fa(d,  S,  f)=vol(S)  J exp(i tT(v0  +  V9))(v0  +  V9-  cr)af(v0  +  VO)  d 0 

So 

—  wol(S)  exp(it{y0)  J  exp(h|,(u0  +  V0)(vo  +  VO  —  a)af(v0  +  VO)  d 0 

So 


where  vol(S')  =  y  det(VTV)  [27].  Here  t  has  been  decomposed  into  its  projec¬ 
tions  t  l  and  f||,  respectively  perpendicular  to  and  parallel  to  the  hyperplane 
H  containing  S.  Computationally,  t\\  =  V(VTV)~1VTt  satishes  an  overde¬ 
termined  least  squares  problem  solved  by  applying  the  pseudoinverse  or  left 
inverse  of  V.  The  perpendicular  component  t±  =  t  — 1\\  lies  in  the  nullspace  of 
VT  and  hence  plays  no  role  in  the  variation  of  the  integrand  over  the  simplex 
(22).  Returning  to  the  original  variables  thus  gives 

Fa(d,  S,  f )  =  exp(i t^y0)Ga(d,  S,  f)  (23) 

where  the  parallel  moment  vector  G  is  defined  by 

Ga(d,  SJ)  =  J  exp(i  tj\s)(s  -  cr)af(s)  d  s.  (24) 

s 


Henceforth  we  focus  on  evaluation  of  G. 
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4-1-2  Small  parallel  variation 


When  ||GTt||  ||  =  ||ETt||  is  small,  the  variation  of  the  exponential  factor  exp(ifjj’s) 
in  the  integrand  of  G  is  small,  so  numerical  quadrature  will  be  extremely  ac¬ 
curate.  For  example,  if  ||RTf||  <  e  then  the  exponential  factor  of  the  inte¬ 
grand  in  Eq.  (24)  can  be  replaced  by  exp(ifjj’no)  with  relative  error  0(e),  by 
exp(it|A0)(l  -l-itjj’s)  with  relative  error  0(e2),  and  so  forth.  It  is  not  necessary 
explicitly  to  carry  out  this  replacement,  however,  because  small  parallel  vari¬ 
ation  guarantees  the  accuracy  of  an  appropriate  quadrature  rule.  Such  rules 
have  been  extensively  developed  and  occur  in  many  variants  [28].  Our  imple¬ 
mentation  employs  suboptimally  accurate  but  convenient  Grundmann-Moellcr 
(GM)  rules  [29],  which  can  be  generated  in  arbitrary  simplex  dimension  and 
degree  of  exactness.  A  GM  rule  which  is  exact  for  the  Q  =  (p+'m+Q+d'j  poly¬ 
nomials  of  degree  p  +  m  +  q  on  S  will  yield  0(\\VTt\\  ||'?+1)  accuracy  for  M 
moments  up  to  order  rri,  at  a  cost  of  0(QM )  arithmetic. 

Specification  of  quadrature  rules  also  suggests  a  specific  choice  of  basis  for 
polynomials  /  of  degree  p  on  a  d-dimensional  simplex  in  RD.  The  Bernstein- 
Bezier  basis,  for  example,  facilitates  geometric  operations  on  curved  surfaces 
[30],  and  has  recently  been  employed  in  the  geometric  nonuniform  fast  Fourier 
transform  [12]  and  in  the  finite  element  method  [31].  We  employ  the  convenient 
and  flexible  Lagrange  representation  which  parametrizes  /  by  polynomial  val¬ 
ues  at  equispaced  points  in  each  simplex.  Thus  a  degree-p  polynomial  /  is 
represented  by  an  P- vector  of  values  f(sa ),  where  P  =  and  typical 

equidistant  point  sets  sa  are  shown  in  Fig.  5. 

Lagrange  representation  simplifies  evaluation  at  the  Q  equidistant  quadrature 
points  of  GM  rules,  and  trivializes  transformation  to  the  reference  simplex 
So-  In  this  representation,  linear  operations  such  as  differentiation,  restric¬ 
tion  to  simplex  boundaries,  or  multiplication  by  shifted  monomials  (s  —  cr)a 
are  straightforward  and  stable.  It  is  also  convenient  when  the  densities  fj 
are  continuous  functions  such  as  the  exponentials,  sinusoids,  and  logarithms 
commonly  encountered  in  applications. 


4-1-3  Dimensional  recurrence 

When  the  variation  is  not  small,  we  evaluate  the  parallel  moments  G  in  Eq. 
(23)  by  a  dimensional  recurrence  based  on  the  Gauss  formula  for  multidimen¬ 
sional  integration  by  parts.  Given  a  vector  h  parallel  to  the  d- dimensional 
affine  hyperplane  H  containing  S,  and  a  smooth  function  <p  on  RA,  the  Gauss 
formula  reads 

J  hT\7<p(s)  ds  =  J  hTn<p(s)  d.s, 
s  as 
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where  the  boundary  d S  and  outward  unit  normal  vector  n  of  S  are  defined 
relative  to  H.  (See  Fig.  6.) 

Applying  the  Gauss  formula  to  the  triple  product  e(s)f(s)g(s )  =  exp(itjjA)/(s)(s 
er)“,  where  /?TVe(s)  =  ihTt\\e(s)  and  g(s )  =  (s  —  cr)a  is  a  polynomial  of  degree 
|  a  |  <  m,  gives 


J  hT V  (ef  g)  =  J  hFnefg 

s  dS 


Fig.  5.  P  =  (p^i<l)  =  2,  3, 4,  5, 3,  6,  fO,  f5  equally  spaced  points  for  the  representation 
of  polynomials  of  degree  p  =  1  through  4  on  simplices  of  dimension  d  =  1  and  2. 
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s 


(25) 


ihTttefg  +  ehTVfg  +  efhTVg. 


Solving  for  fs  efg  gives 


1 

i  hTt 


J  hTnef  g  -  J  ehTWfg  -  J  efhT\/g 

as  s  s 


Fig.  6.  Vertices  v and  outward  unit  normal  vectors  n k  relative  to  the  affine  hyper¬ 
plane  H ,  for  a  simplex  with  d  =  2  in  ambient  dimension  D  =  3. 
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or 


J  e((J  +  zTW)f)g  =  j  zTnef  g  -  J  efzT\7g  (26) 

S  dS  s 

where  z  =  h/(ihTt\\)  and  /  is  the  identity  operator.  Since  (26)  holds  for  all 
polynomials  /  and  g.  it  holds  with  /  replaced  by  the  degree-p  polynomial 

fi  =  (i  +  zTv)  ^  f  =  (/  -  zT \7  +  (zTV)2  -■■■  +  (-1  nzTvy)  /. 


The  sum  terminates  because  deg (/)  <  p.  Here  /  +  z7  V  is  an  invertible  linear 
operator  on  the  P-dimensional  space  of  polynomials  of  degree  <  p,  since  all 
eigenvalues  of  the  operator  zT'V  are  0.  Since  (/  +  z1  V)/i  =  /,  Eq.  (26)  yields 

f  ef9  =  f  zTn  efig  -  J  efxzT^J g  =  J  zTn  ej\g0  +  J  ef1g1  (27) 

S  dS  S  dS  s 

where  go  =  g  and  gu  =  (— z1  V)kg  is  a  polynomial  of  degree  <  m  —  k  for  k  >  1. 

Iterating  the  recurrence  relation  (27)  eventually  eliminates  the  integrals  over 
S  completely  since  gm+  \  =  0,  yielding  a  dimensional  recurrence 

J  efa  =  f  zTn  e  (fxg0  +  f2gx  H - h  fm+igm)  (28) 

s  as 


where  fk  —  (I  +  zTV)~kf  for  k  >  1.  Each  g3  term  is  a  linear  combination  of 
moments  of  fJ+\  with  order  rn  —j ,  so  the  dimensional  recurrence  (28)  expresses 
order-m  moments  of  a  degree-p  polynomial  over  a  simplex  S  of  dimension  d, 
as  linear  combinations  of  order-  (m  —  r)  moments  of  (m  +  1)  different  degree-p 
polynomials  fr  over  d  +  1  simplices  dkS  of  dimension  d  —  1.  Here  the  boundary 
dS  =  Udk=0dkS  consists  of  d  +  1  oriented  simplices  dkS  of  dimension  d  —  1, 
formed  by  omitting  vertex  k  —  0  through  d  successively. 

Define  the  D-vector  of  moment  shift  operators  E  =  (Ex, . . . ,  ED)  such  that 

D 

zTV(s  -  a)a  =  ztE(s  -  a)a  =  ^  Zjaj(s  -  a)a~ei 

3= 1 


and 


zTEGQ(d,  S,f)  =  J2  ZjGa-eM ,  S,  f ) 

3= 1 
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for  a.j  >  0.  Let  G(d,  S,  f)  denote  the  M-vector  of  moments  Ga(d,  S,  /).  Then 
the  dimensional  recurrence  (28)  reads  explicitly 


G(d,  S,f)  =  ±  zTnk(G(d  -  1,  dkS,  fc)  -  zTEG(d  -  1,  dkS,  f2)  (29) 

k= 0 

+  {-zTE)2G(d  -  1,  dkS,  h)  +  •  •  •  +  {-zTE)mG(d  -  1,  dkS,  fm+1)) 

d  m 

=  E  zTn k  Y,(-zTE)rG(d  -  1,  dkS,  fr+1)  (30) 

k= 0  r=0 

For  a  single  moment  of  order  |a|  =  0,  as  in  direct  evaluation  of  the  Fourier 
transform  (1),  the  double  sum  (29)  simplifies  to 

G(d,  S,f)  =  jr  zTnk  G(d  -  1,  dkS,  h).  (31) 

k= 0 


Thus  the  direct  transform  is  exactly  the  sum  of  d  +  1  direct  transforms  of 
degree-p  polynomials  f\  over  (d—  l)-dimensional  simplices  dkS.  The  recurrence 
terminates  when  the  dimension  of  the  simplex  reaches  0,  as  the  simplices 
then  become  points.  Thus  the  cost  of  the  direct  transform  on  d-dimensional 
simplices  is  equivalent  to  (d  +  1)!  pointwise  direct  transforms.  (Note  that  this 
recurrence  does  not  convert  standard  fast  algorithms  for  points  where  d  =  0 
to  algorithms  for  simplices  with  d  >  0,  because  the  transformation  /  — >  fi 
depends  on  the  target  t .) 

An  alternate  dimensional  recurrence  can  be  obtained  by  interchanging  the 
roles  of  /  and  g.  Such  an  interchange  leads  to  a  shorter  recurrence 

Jefg  =  f  zTn  e  (V f  +  g2 f1  +  ■  ■  ■  +  gp+1fp)  (32) 

S  dS 

if  p  <  m,  where  now 

gk  =  (I  +  zTV)-kg,  f  =  (-zTV)kf. 


Each  gk  is  a  linear  combination  of  order-m  moments,  so  the  dimensional  recur¬ 
rence  (32)  expresses  order-m  moments  of  a  degree-p  polynomial  over  a  simplex 
S  of  dimension  d,  as  linear  combinations  of  order-m  moments  of  (p+1)  degree- 
(p  —  r)  polynomials  fr  over  d  +  1  simplices  dkS  of  dimension  d  —  1.  Explicitly, 

G(d,  S,f)  =  ±  zTnk((I  +  zTE)-1G(d  -  1,  dkS,  f°) 

k= 0 
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+  (/  +  zTE)~2G(d  -  1,  dkS,  f1)  +  •••  +  (/  +  zTE)~P~1G(d  -  1,  dkS,  fp)) 

d  p+ 1 

=  E  E(J  +  zTE)~r~lG{d  -  1,  dkS,  fr).  (34) 

fc=0  r=  1 

Fig.  7  displays  a  G  evaluation  scheme  which  combines  quadrature  for  small 
parallel  variation  with  recurrence  (34)  for  large  parallel  variation. 

Fig.  7.  Recursive  evaluation  of  G  =  G(d ,  5,  /)  by  quadrature  and  recurrence, 

if  H^Eill  <  e 

Generate  quadrature  rule  of  order  p  +  m 
Evaluate  G(d,  S,  /)  by  quadrature 

else 


G  =  0 

f°  =  f 

for  r  =  1 . . .  p  +  1 
fr  =  (~zT'V)fr~1 
for  k  =  0  . . .  d 

G  —  G  +  (I  +  zT E)~r~1G(d  -  1,  dkS,  fr) 

end  for 
end  for 
end  if 


4-1.4  Computational  cost  and  stability 

The  total  computational  effort  W(m,p,  d,  D)  required  to  evaluate  all  order-m 
moments  of  a  degree-p  polynomial  on  a  dimension- d  simplex  in  can  now 
be  bounded.  Using  the  dimensional  recurrence  (28),  for  example,  requires  a 
P  x  P  matrix- vector  multiply  to  generate  each  of  the  m  auxiliary  polynomials 
fr+1.  Then  M  moments  G(d,S,f)  for  \a\  <  m  are  linear  combinations  of 
O(M)  boundary  moments  of  order  m  —  r  for  fr+l  as  r  ranges  from  0  to  m. 
Hence  the  total  cost  W(m,p,d,D)  for  evaluating  all  M  moments  in  simplex 
dimension  d  satisfies 

W(m,p,  d,  D)  <  mpd  +  0(m2D+1)W(d  —  l,m)  <  0(mpd  +  m1+d(2D+1)) 


Although  the  exponent  is  large,  the  cost  of  moment  evaluation  is  much  smaller 
in  practice  because  many  branches  of  the  dimensioonal  recurrence  terminate 
with  quadrature. 

Both  recurrences  (28)  and  (32)  are  numerically  stable  when  z  is  small  and  the 
terms  in  the  formally  infinite  sum  defining  (J+^TV)_1  rapidly  decrease  to  zero. 
The  natural  choice  h  =  t\\  makes  z  =  —  h||/||t|| ||2  small  when  t\\  is  large.  When 
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the  variation  ||t||||  is  small,  of  course,  the  recurrence  can  be  highly  unstable 
and  numerical  integration  is  employed  as  in  4.1.2. 


4-2  Tree  structures 


Two  additional  ingredients,  approximation  and  remaindering,  are  necessary  to 
localize  geometric  objects  such  as  simplices  into  spatial  tree  structures  similar 
to  those  of  Section  2.2,  with  cells  satisfying  inequality  (18). 

Unlike  points,  a  given  simplex  may  not  lie  exactly  within  a  given  cubical  cell 
Sij.  Thus  we  define  a  simplex  S  to  lie  e-approximately  within  a  cell  a  +  RQ 
if  each  vertex  v  of  S  satisfies  \vj  —  <jj\  <  (1  +  e)R,  for  j  =  1  to  T.  Our  tree 
construction  procedure  initially  assigns  all  simplices  to  the  root  cell  Sop-  Then 
simplices  are  e-approximately  assigned  to  child  cells  when  possible. 

However,  some  simplices  may  be  too  large  or  too  awkwardly  placed  to  fit  into 
any  cell  on  a  given  level  l.  See  Figs.  8  and  9  for  examples.  Such  simplices  simply 
remain  in  the  smallest  possible  source  cell,  and  contribute  into  the  coefficients 
later.  (They  can  additionally  be  subdivided  by  the  algorithm  of  [12]).  The 
butterfly  algorithm  collects  these  simplices  at  each  level  as  it  works  its  way 
up  and  down  the  source  and  target  trees.  After  each  split-merge  step,  source 
simplices  remaining  in  each  source  cell  Si_u  are  folded  into  the  coefficient 
vector  for  each  target  cell  TL_i+1j  via 


Ca{&l-l,I,  TL-l+l,j)  —  Ca ((T;_i  / ,  TL-l+lj)  (35) 

il«l 

SfcCSi-!,/  a- 

The  cost  of  the  algorithm  is  unaffected  by  this  modification  if  the  number  of 
remaindered  simplices  is  0(1). 

A  detailed  piecewise-polynomial  butterfly  algorithm  is  exhibited  in  Fig.  10. 


Disfriti 
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Fig.  8.  A  spatial  tree  structure  for  geometric  data  with  d  =  D  =  2  and  L  =  4  levels. 
Simplices  of  various  sizes  are  sorted  into  cells  of  a  tree  structure  with  an  overlap 
tolerance  of  5  percent,  leaving  a  few  awkwardly  placed  simplices  in  non-leaf  cells  on 
every  level.  Almost  all  of  the  simplices  are  e-approximately  assigned  to  leaf  cells  on 
Level  4. 


(a)  Points  (b)  Segments  (c)  Triangles 

Fig.  9.  Spatial  tree  structures  for  points,  segments  and  triangles 
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Fig.  10.  A  piecewise-polynomial  butterfly  algorithm 

Step  1  -  Localization 

Sort  source  simplices  Sj  and  target  points  tk  into  minimal  cells  of  L-lcvcl  trees  Sl  and  7l 

Step  2  -  Compute  source-centered  coefficients 

for  j  —  Q...  2dl  —  1 

S  =  Sl j 

(?  =  <Jl.j 
r  =  r0,  o 
for  |  a  |  <  m 

Ca(cr,T )  =  exp(irTor)i^-  Esfccs/sfc(s  ~  exp(irT(s  -  a))fk(s )  ds 

end  for 
end  for 

Step  3  -  Butterfly  with  remainder 

for  l  =  L  .. .  1 

for  1  =  0...  2Dl  —  1 

5  = 

<7  =  01-1,7 

for  J  =  0  . . .  2D('L~l')  -  1 
r  =  tL-i}J 
for  j  =  0  . . .  2d  —  1 

rl  —  TL-l+l,2DJ+j 

C((T,  n)  =  eLo"1  ,I/2D+i  ~  ai  Tl)R(IJl,I/2D+i,  T  —  Tl)C{criJ/2D+ii  T) 

for  Sk  C  S 
for  |  a  |  <  m 

Ca(cr,  Ti)  =  Ca{cr,  n)  +  exp(ir1T cr)§  fSk(s  -  a)aexp(ir[ ( s  -  cr))fk(s)  ds 

end  for 
end  for 
end  for 
end  for 
end  for 
end  for 


Step  4  -  Evaluate  target-centered  expansions 

for  j  =  0  . . .  2dl  —  1 

a  —  0o,o 
T  =  Tlj 

T  =  TL,j 

for  tk  G  Tlj 

f(tk)  =  exp(i(4  -  T)Ta))  E|a|<m  Ca(cr,  r)(tk  -  r)Q 

end  for 
end  for 
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5  Numerical  results 


The  piecewise-polynomial  butterfly  algorithm  (10)  has  been  implemented  in 
the  Fortran  77  programming  language,  for  simplices  of  arbitrary  dimension  d  in 
arbitrary  ambient  dimension  D,  with  multidimensional  arrays  mapped  to  one¬ 
dimensional  arrays.  Linear  operations  such  as  differentiation,  integration  and 
interpolation  of  polynomials  are  implemented  by  matrix-vector  multiplications 
with  precomputed  matrices. 

As  a  result  of  this  dimensional  generality,  our  present  implementation  is  sub- 
optimally  efficient,  and  all  timing  results  should  be  viewed  as  preliminary.  We 
plan  to  investigate  speedups  such  as  parallelization  in  future  work. 

The  accuracy  and  efficiency  of  the  implementation  have  been  verified  on  a 
gallery  of  test  cases  including  simplices  of  dimensions  0  through  2  in  ambient 
dimensions  1  through  3.  All  results  below  were  obtained  with  the  gfortran 
compiler  on  a  single  Intel  Xeon  E5-2670  v2  processor  with  64  GB  of  memory. 


5. 1  Discrete  points  in  RD 

Fig.  (10)  evaluates  the  pointwise  nonuniform  Fast  Fourier  Transform  (3)  in 
the  case  when  the  source  simplex  dimension  d  —  0.  We  tested  the  pointwise 
transform  on  N  random  source  and  target  points  distributed  over  source  and 
target  domains  of  size  0(1)  and  O(N)  respectively.  Table  1  shows  the  CPU 
times  Ts  required  to  achieve  s  =  3,  6,  9  and  12-digit  accuracy,  as  well  as  the 
CPU  time  Td  required  by  direct  evaluation.  The  CPU  time  Tp  required  for  a 
standard  Ap-point  complex  D-dimensional  FFT  [32]  is  also  shown.  The  FFT 
timings  are  monotonized  by  choosing  Np  to  be  the  smallest  product  of  powers 
of  2,  3  and  5  which  is  larger  than  N .  Tables  2  and  3  show  the  CPU  times  Ts  for 
D  —  2  and  D  =  3,  with  asterisks  indicating  cases  where  the  butterfly  algorithm 
required  more  memory  than  the  4  GB  installed  in  our  current  workstation. 
We  conclude  that  the  pointwise  butterfly  algorithm  is  much  faster  than  direct 
evaluation  even  for  small  problem  sizes  N  and  maximum  twelve-digit  accuracy, 
in  D  —  1  dimensions.  In  D  =  2  dimensions  it  is  faster  for  all  problem  sizes 
N  at  three-digit  accuracy,  but  twelve-digit  accuracy  does  not  break  even  until 
N  =  8100.  In  D  =  3  dimensions  it  is  faster  for  all  problem  sizes  N  at  three-digit 
accuracy,  while  six-digit  accuracy  does  not  break  even  until  N  =  5832.  The 
memory  limitations  of  our  current  workstation  slow  the  butterfly  algorithm 
considerably  for  higher  accuracy  in  three  dimensions,  so  that  it  never  breaks 
even  with  direct  evaluation  if  more  than  six  digits  are  requested.  Hence  our 
higher- accuracy  results  in  two  and  three  dimensions  tend  to  understate  the 
performance  of  the  butterfly  algorithm;  it  should  run  orders  of  magnitude 
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faster  in  less  constrained  computing  environments. 

Compared  to  the  classical  FFT  for  Np  uniform  points,  the  butterfly  algorithm 
is  asymptotically  within  two  orders  of  magnitude  for  three-digit  accuracy. 
Since  our  implementation  operates  in  arbitrary  simplex  dimension  and  ambi¬ 
ent  dimension,  an  optimized  implementation  might  run  ten  times  faster,  and 
compare  more  favorably  with  a  standard  uniform  FFT.  The  cost  of  the  but¬ 
terfly  algorithm  is  a  polylogarithmic  function  of  the  accuracy,  and  our  results 
suggest  that  doubling  the  accuracy  less  than  doubles  the  cost  (in  D  —  1). 

Table  1 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  N  random  points  with  d  =  0 
in  ambient  dimension  D  =  1. _ 


N 

Tf 

Td 

t3 

n 

t9 

T,2 

729 

0.00008 

0.087 

0.006 

0.009 

0.012 

0.020 

5832 

0.00063 

5.714 

0.057 

0.098 

0.132 

0.216 

10935 

0.00120 

19.789 

0.110 

0.183 

0.256 

0.329 

16038 

0.00180 

42.655 

0.162 

0.254 

0.413 

0.580 

21141 

0.00240 

74.195 

0.231 

0.392 

0.513 

0.703 

26244 

0.00300 

113.856 

0.278 

0.451 

0.629 

1.026 

31347 

0.00359 

163.317 

0.359 

0.554 

0.931 

1.163 

36450 

0.00439 

224.520 

0.430 

0.678 

0.998 

1.391 

41553 

0.00498 

290.851 

0.502 

0.772 

1.107 

1.537 

46656 

0.00537 

369.967 

0.528 

0.875 

1.225 

2.014 

3is{n 
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Table  2 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  N  random  points  with  d  =  0 
in  ambient  dimension  D  =  2. _ 


N 

Tf 

Td 

Ts 

n 

t9 

T\  2 

729 

0.00018 

0.100 

0.035 

0.158 

0.418 

0.938 

2304 

0.00048 

0.990 

0.087 

0.481 

1.665 

3.464 

4761 

0.00098 

4.091 

0.165 

1.056 

2.974 

7.028 

8100 

0.00178 

11.865 

0.281 

1.522 

4.813 

11.322 

12321 

0.00273 

28.155 

0.394 

2.723 

7.261 

15.510 

17424 

0.00391 

55.471 

0.681 

3.100 

10.457 

23.183 

23409 

0.00547 

100.128 

0.717 

4.496 

13.668 

33.479 

30276 

0.00703 

166.163 

0.869 

5.145 

18.234 

39.695 

38025 

0.00937 

259.937 

0.945 

6.598 

20.500 

53.227 

46656 

0.01250 

408.240 

1.578 

8.313 

29.188 

61.246 

Table  3 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  N  random  points  with  d  =  0 
in  ambient  dimension  D  =  3. _ 


N 

Tf 

Td 

Ts 

Tq 

t9 

T\  2 

729 

0.00035 

0.114 

0.129 

0.816 

5.859 

23.207 

1728 

0.00085 

0.675 

0.242 

3.063 

12.023 

50.125 

3375 

0.00347 

2.637 

0.344 

4.563 

21.141 

77.188 

5832 

0.00586 

6.379 

1.031 

6.516 

29.531 

191.641 

9261 

0.01875 

17.364 

1.250 

8.563 

73.594 

248.813 

13824 

0.01875 

38.880 

1.938 

20.438 

92.281 

4219.156 

19683 

0.03750 

73.811 

2.375 

26.844 

290.906 

*** 

27000 

0.03750 

151.875 

2.813 

31.375 

565.625 

*** 

35937 

0.03750 

247.067 

3.125 

37.500 

615.000 

*** 

46656 

0.03750 

408.240 

3.438 

41.313 

*** 

*** 
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5.2  Line  segments  in  R'0 

We  tested  the  butterfly  algorithm  on  n  randomly  placed  line  segments  in 
RD,  setting  each  fj  to  a  random  cubic  polynomial  and  evaluating  f(tk)  at  N 
random  points  tk  G  R"0.  Here  N  is  the  total  number  of  degrees  of  freedom 
in  the  input,  which  for  cubic  polynomials  on  line  segments  is  4 n,  and  Np  is 
again  the  smallest  multiple  of  powers  of  2,  3  and  5  which  is  larger  than  N. 

As  in  the  case  of  points  where  d  —  0,  the  butterfly  algorithm  compares  very 
favorably  with  direct  evaluation.  For  six-digit  accuracy  it  beats  direct  evalua¬ 
tion  for  every  problem  size  N,  by  three  orders  of  magnitude  when  N  =  186624. 
Since  the  number  of  degrees  of  freedom  is  larger,  the  standard  FFT  has  more 
work  to  do  and  the  butterfly  algorithm  does  better  by  comparison.  For  three- 
digit  accuracy  in  D  >  1,  it  runs  as  fast  as  25  FFTs. 

Table  4 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  cubic  polynomials  on 
n  =  N/ 4  random  segments  with  d  =  1  in  ambient  dimension  D  =  1. 


N 

Tf 

Td 

t3 

n 

Tg 

Tv2 

2916 

0.00030 

0.948 

0.006 

0.010 

0.014 

0.022 

23328 

0.00260 

60.819 

0.060 

0.098 

0.136 

0.228 

43740 

0.00500 

214.415 

0.125 

0.205 

0.286 

0.356 

64152 

0.00740 

460.701 

0.189 

0.294 

0.446 

0.574 

84564 

0.00981 

798.362 

0.235 

0.397 

0.581 

0.757 

104976 

0.01240 

1230.700 

0.321 

0.507 

0.692 

1.080 

125388 

0.01475 

1760.514 

0.390 

0.622 

0.842 

1.218 

145800 

0.01660 

2375.657 

0.430 

0.743 

1.065 

1.355 

166212 

0.01992 

3084.012 

0.503 

0.847 

1.192 

1.499 

186624 

0.02109 

3891.037 

0.527 

0.956 

1.323 

2.074 
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Table  5 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  cubic  polynomials  on  N 
random  segments  with  d  =  1  in  ambient  dimension  D  =  2. 


N 

Tf 

Td 

Ts 

Tq 

Tg 

T\  2 

2916 

0.00059 

1.068 

0.024 

0.118 

0.409 

0.865 

9216 

0.00195 

10.598 

0.090 

0.464 

1.201 

2.739 

19044 

0.00391 

45.378 

0.176 

0.742 

2.500 

6.881 

32400 

0.00703 

134.156 

0.223 

1.299 

4.814 

10.311 

49284 

0.01172 

302.731 

0.412 

2.041 

7.145 

14.998 

69696 

0.01641 

607.117 

0.455 

2.979 

9.045 

22.514 

93636 

0.02031 

1093.639 

0.770 

4.555 

13.367 

28.801 

121104 

0.02813 

1842.578 

0.832 

5.066 

14.195 

38.488 

152100 

0.03750 

2954.364 

0.973 

5.391 

20.215 

41.859 

186624 

0.04219 

4441.432 

1.082 

6.969 

24.133 

60.227 

Table  6 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  cubic  polynomials  on  N 
random  segments  with  d  =  1  in  ambient  dimension  D  =  3. 


N 

Tf 

Td 

Ts 

Tq 

Tg 

T\  2 

2916 

0.00142 

1.196 

0.133 

0.766 

3.469 

12.797 

6912 

0.00361 

6.885 

0.188 

1.336 

11.797 

25.906 

13500 

0.00469 

26.631 

0.359 

3.523 

20.883 

74.656 

23328 

0.00937 

80.190 

0.484 

6.359 

29.500 

107.219 

37044 

0.01875 

199.690 

0.578 

8.641 

74.047 

229.078 

55296 

0.01875 

444.960 

1.563 

10.344 

48.969 

687.000 

78732 

0.03750 

898.037 

1.813 

12.594 

112.000 

4574.969 

108000 

0.03750 

1704.375 

2.813 

15.188 

329.375 

*** 

143748 

0.03750 

3009.724 

3.250 

39.563 

609.938 

*** 

186624 

0.03750 

5073.840 

3.813 

45.625 

4399.375 

*** 
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5.3  Triangles  in  HD 

We  also  tested  the  butterfly  algorithm  on  n  randomly  placed  triangles  in 
R°,  setting  each  fj  to  a  random  cubic  polynomial  and  evaluating  f(tk )  at  N 
random  points  in  R/h  Here  N  is  the  total  number  of  degrees  of  freedom  in 
the  input,  which  for  cubic  polynomials  on  triangles  is  lOn. 

As  in  the  case  of  segments,  the  butterfly  algorithm  compares  very  favorably 
with  direct  evaluation.  For  six-digit  accuracy  it  beats  direct  evaluation  for 
every  problem  size  N,  by  three  orders  of  magnitude  when  N  =  466560.  Fur¬ 
thermore,  the  butterfly  algorithm  requires  only  10  FFTs  to  obtain  three-digit 
accuracy  in  ambient  dimension  D  =  2. 

Table  7 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  cubic  polynomials  on 
n  =  N/ 10  random  triangles  with  d  =  2  in  ambient  dimension  D  =  2. 


N 

TF 

Td 

t3 

n 

Tg 

T,2 

7290 

0.00154 

6.400 

0.029 

0.133 

0.438 

0.943 

23040 

0.00562 

66.414 

0.070 

0.434 

1.285 

2.869 

47610 

0.01045 

274.862 

0.145 

0.807 

2.405 

5.791 

81000 

0.01992 

786.507 

0.264 

1.410 

4.948 

11.447 

123210 

0.03047 

1837.802 

0.323 

2.271 

6.012 

15.571 

174240 

0.04219 

3651.893 

0.391 

3.201 

9.545 

22.889 

234090 

0.06094 

6772.151 

0.676 

3.623 

13.414 

32.809 

302760 

0.07969 

11314.472 

1.098 

5.680 

15.449 

36.672 

380250 

0.10938 

18017.314 

1.215 

6.094 

22.102 

50.063 

466560 

0.13125 

27191.700 

1.344 

7.688 

23.680 

64.141 
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Table  8 

CPU  times  Ts  for  s  =  3,  6,  9  and  12-digit  accuracy,  for  cubic  polynomials  on 
n  =  IV/ 10  random  triangles  with  d  =  2  in  ambient  dimension  D  =  3. 


N 

Tf 

Td 

t3 

n 

Tg 

T,2 

7290 

0.00361 

7.233 

0.063 

0.797 

3.797 

13.508 

17280 

0.00000 

41.040 

0.219 

1.406 

6.859 

27.203 

33750 

0.01875 

155.039 

0.297 

4.672 

18.469 

79.641 

58320 

0.01875 

472.027 

0.500 

6.500 

29.938 

111.938 

92610 

0.01875 

1172.095 

0.656 

9.000 

39.406 

253.781 

138240 

0.03750 

2609.280 

0.844 

11.219 

51.781 

3070.313 

196830 

0.07500 

5265.203 

2.063 

14.188 

116.500 

4854.438 

270000 

0.11250 

9973.125 

2.438 

17.188 

342.938 

*** 

359370 

0.15000 

17182.378 

2.688 

21.438 

364.063 

*** 

466560 

0.18750 

28343.520 

4.000 

24.063 

3126.563 

*** 
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6  Extensions  and  applications 


The  algorithm  described  above  can  be  accelerated,  generalized  and  applied 
in  several  ways.  We  discuss  acceleration  by  change  of  basis  via  low-rank  ap¬ 
proximation,  extension  to  the  evaluation  of  Galerkin  matrix  elements,  and 
generalization  to  Laplace  and  Gauss  transforms. 


6. 1  Acceleration 


A  standard  ingredient  of  many  other  butterfly  algorithms  is  a  choice  of  basis 
functions  which  accelerates  the  algorithm.  For  example,  Taylor  series  approx¬ 
imation  of  the  exponential  is  suboptimal  for  approximation  on  a  real  interval. 
Minimax  or  Chebyshev  approximation  can  attain  equal  accuracy  with  fewer 
terms,  leading  to  a  factor  of  2  speedup  in  the  one- dimensional  pointwise  but¬ 
terfly  algorithm  of  [15]. 

Often  the  basis  functions  are  chosen  implicitly  by  transforming  the  shift  and 
merge  matrices  in  the  intermediate  steps.  Butterfly  algorithms  can  be  thought 
of  as  matrix  factorizations  [22],  where  the  basic  matrix  A  with  elements 

Ajk  =  exp  itj Sk 


is  written  as 

A  =  EB1B2  •  •  •  BlF. 


Here  F  forms  source  values  into  coefficients,  E  evaluates  approximate  values 
at  targets,  and  the  intermediate  matrices  Bj  are  sparse  block  matrices  that 
perform  shift  and  merge  operations  on  the  coefficients.  Most  of  the  computa¬ 
tional  effort  goes  into  applying  the  intermediate  factors  Bj,  so  economizing  Bj 
pays  large  rewards.  For  example,  consider  diagonalizing  each  block  of  Bj  as 
in  [33,34],  Then  shift  and  merge  operations  become  diagonal,  reducing  their 
cost  substantially  from  0(M2)  to  O(M).  However,  the  transformation  to  di¬ 
agonal  form  is  difficult  to  implement  in  an  efficient  stable  form.  Thus  popular 
alternatives  tend  to  employ  factorizations  such  as  the  SVD  or  interpolative 
decompositions. 

Interpolative  decompositions  factorize 

Bj  =  XYZ 


Disirii 
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with  X  and  Z  submatrices  of  Bj  and  Y  small  and  well-conditioned.  They  speed 
up  the  application  of  Bj  to  arbitrary  vectors,  and  offer  considerable  speedups 
in  the  butterfly  algorithm.  We  plan  to  incorporate  such  factorizations  into 
future  versions  of  our  code. 


6.2  Galerkin  matrix  elements 


In  many  applications,  pointwise  evaluation  of  the  Fourier  transform  is  less 
desirable  than  the  application  of  Galerkin  matrix  elements 


fkj 


exp(i tTs)fj(s)  ds  dt, 


(36) 


where  Tk  C  RD  are  d-dimensional  target  simplices  and  gk  are  degree-p  poly¬ 
nomials.  Our  algorithm  extends  immediately  to  the  fast  application  of  these 
Galerkin  matrices.  The  Taylor  expansion  (11)  separates  the  variables  t  and  s 
so  that 


fkj  —  exp(irTcr)  ^  (37) 

|«|<m 

J(t-  r)a  exp(i (t  -  r)Ta)gk(t)  dt 

Tk 

J(s-a)a  exp(i tt(s  -  a))fj(s)  ds  +  Fm 
Sj 

with  controllable  error  Fm.  Summing  over  j  yields  the  same  source-centered 
coefficient  vectors  as  Fig.  (10).  After  the  butterfly  of  Section  4  produces  target- 
centered  coefficient  vectors,  the  final  evaluation  step  of  Fig.  (10)  is  replaced 
by  a  dimensional  recurrence  identical  to  (28)  with  sources  replaced  by  targets. 

The  resulting  algorithm  requires  a  slightly  strengthened  version  of  inequality 
(10).  Since  there  are  simplices  rather  than  points  in  both  target  space  and 
source  space,  the  rank-M  kernel  approximation  (11)  maintains  accuracy  only 
if 


/  /?e  \  m 

\(t  —  t)t (s  —  cr) I  <  R  where  (  —  )  <e  (38) 

V  m  J 

whenever  S'  is  a  source  tree  cell  with  center  a,  T  is  a  target  tree  cell  with  center 
r,  Tk  C  T,  Sj  C  S,t  G  Tk  and  s  G  Sr  Roughly  speaking,  kernel  approximation 
fails  if  there  are  large  simplices  in  both  source  and  target  space.  This  is  natural 
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since  large  simplices  encounter  many  wavelengths  of  the  oscillatory  kernel 
exp(itTs),  and  thus  require  high-frequency  approximation.  This  restriction  can 
be  eliminated  by  simplex  subdivision  [12]. 


6.3  Laplace  and  Gauss  transforms 


Counterbalancing  the  potential  speedups  available  from  optimized  basis  func¬ 
tions  is  a  loss  of  generality.  For  example,  Taylor  expansion  of  the  exponential 
is  optimal  over  disks  rather  than  intervals.  Hence  our  algorithm  extends  imme¬ 
diately  to  evaluate  the  fast  Laplace  transform  [19],  where  i  =  y/—l  is  replaced 
by  -1. 

The  pointwise  Gauss  transform  [17,18] 

N 

f(tk )  =  5ZexP(~ll^  -  sj \\2)fj,  1  <k<N,  (39) 

j= i 


where  ||f||  is  the  Euclidean  norm,  can  be  viewed  as  a  pre-  and  post- pro  cessed 
Laplace  transform  since 

exp(-||4  -  Sj\\2)  =  exp(-||4||2)exp(2^sJ)exp(-||sj||2). 

Hence  our  piecewise-polynomial  butterfly  algorithm  immediately  extends  to 
evaluate  the  piecewise-polynomial  Gauss  transform  with  matrix  elements 

fkj  =  J  9k(t )  J  exp(— ||f  -  s||2)/j(s)  dsdt.  (40) 

Tk  Sj 
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Abstract 

A  bilinear  quadrature  numerically  evaluates  a  continuous  bilinear  map,  such  as 
the  L 2  inner  product,  on  continuous  /  and  g  belonging  to  known  finite-dimensional 
function  spaces.  Such  maps  arise  in  Galerkin  methods  for  differential  and  integral 
equations.  The  construction  of  bilinear  quadratures  over  arbitrary  domains  in  Rd  is 
presented.  In  one  dimension,  integration  rules  of  this  type  include  Gaussian  quadrature 
for  polynomials  and  the  trapezoidal  rule  for  trigonometric  polynomials  as  special  cases. 
A  numerical  procedure  for  constructing  bilinear  quadratures  is  developed  and  validated. 


1  Introduction 

Classical  quadratures  such  as  Gaussian  and  trapezoidal  rules  accurately  evaluate  continuous 
linear  functionals  such  as 

/  f(x)w(x)  dx 
Jn 

for  /  in  a  finite-dimensional  space  of  continuous  functions.  Bilinear  quadratures  evaluate 
continuous  bilinear  forms  such  as  the  weighted  L 2  inner  product 


(f,g)L2=  /  f{x)g{x)w{x)dx 
Jn 


or  the  weighted  H 1  inner  product 


=  /  X 


fdf  ,  ,  dg  \ 
I - aij{x) - I 


'  *,j= l 


\dx 


dxj) 


+  f{x)g(x)  dx 


on  finite-dimensional  spaces  of  continuous  functions  /,  g  on  17  C 

L 2  inner  products  compute  orthogonal  projections  onto  subspaces,  while  H 1  inner  prod¬ 
ucts  provide  local  solutions  to  elliptic  problems,  a  key  ingredient  of  the  finite  element 
method.  For  example,  let  fi  C  be  a  smooth  bounded  domain,  let  L  be  a  uniformly 
elliptic  operator,  /  £  g  £  L2(5n),  and  7  £  L°°(<9fl).  Consider  the  Robin  problem 


Find  u  £  H1(fl )  satisfying 
Lu  =  f  in  Cl,  ju  +  =  g  on 


(1.1) 


When  L  =  —A,  then  if  a  bilinear  form  a  :  H1( f!)  x  if1(i7)  — >  R  is  defined  by  a(u,v)  = 
fn  Du  ■  Dv  +  fdn  quu,  the  weak  formulation  to  (1.1)  seeks  u  £  H 1(fl)  satisfying 


a(u,v)  =  {f,v)L 2(n)  +  (g,v)L 2 for  all  v  £  H ^fi). 


(1.2) 


The  Galerkin  method  constructs  an  approximate  solution  to  (1.2)  by  choosing  finite-dimensional 
function  spaces  Do,  Go  and  seeking  uq  £  Do  satisfying 


a(u0,u0)  =  (f,v o)i2(n)  +  (g,v o)L2(3fi)  for  all  v0  G  Go- 


(1.3) 
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The  linear  system  (1.3)  is  solved  in  a  basis,  which  requires  computing  a  number  of  L 2  inner 
product  integrals.  These  integrals  should  be  computed  both  efficiently  and  accurately. 

Efficiency  is  achieved  by  using  the  fewest  function  evaluations  possible.  When  d  >  1, 
the  optimal  efficiency  of  a  classical  quadrature  is  unknown.  For  a  bilinear  quadrature,  the 
minimum  number  of  function  evaluations  is  equal  to  the  dimension  of  function  space  being 
integrated.  The  inner  product  of  two  functions  /,  g  belonging  to  given  finite-dimensional 
function  spaces  is  computed  by  the  formula 

(f,g)  =  f(x)*Wg(  y),  (1.4) 

where  /(x)  £  Rm  and  g(y)  £  R”  are  evaluations  of  /  and  g  at  sets  of  points  x  and  y  in 
57,  respectively,  and  W  is  a  matrix.  The  rank  of  the  bilinear  form  is  equal  to  the  rank  of 
W,  hence  the  minimal  number  of  required  function  evaluations  is  equal  to  the  dimension  of 
that  function  space. 

Accuracy  is  achieved  by  defining  and  minimizing  integration  error.  In  a  bilinear  quadra¬ 
ture,  this  is  a  nonlinear  optimization  problem  for  x,  y,  and  W  in  (1.4),  and  is  solved  using  a 
Newton  method  for  an  appropriate  objective  function  [CRY99,  BGR10,  XG10].  In  this  pa¬ 
per  an  objective  function  is  developed  and  demonstrated  to  yield  numerically  useful  bilinear 
quadrature  rules  in  a  general  setting. 

Numerical  evaluation  of  inner  product  integrals  has  been  studied  in  [BD71,  McG79, 
Gri80,  BGR10,  Chel2]  and  as  “bilinear  quadrature”  in  [LZ87,  Kno07].  This  paper  bor¬ 
rows  some  of  the  framework  from  these  past  works  but  develops  and  utilizes  a  different 
optimization  procedure  to  produce  quadrature  rules. 


2  Theory 

2.1  Abstract  formulation 

In  this  section  the  problem  of  evaluating  a  general  continuous  bilinear  form  on  a  pair  of 
Banach  spaces  is  considered.  Results  are  given  in  great  generality  so  that  they  apply  to  any 
continuous  bilinear  forms.  Later,  these  results  are  applied  to  useful  special  cases  such  as  the 
L 2  and  H 1  inner  products. 

Definition  2.1.  Let  T  and  Q  be  real  Banach  spaces.  Then  a  bilinear  quadrature  of  order 
(m,  n)  on  T  x  Q  is  a  bilinear  form  Q  defined  by  linear  maps  L\  :  T  — >  Rm  and  Li  :  Q  — i  R" 
and  a  bilinear  map  B  :  Rm  x  Rn  — >  R,  such  that,  for  each  /  £  T  and  g  £  Q, 

Q{f,g)  =  B(L1f,L2g). 

Definition  2.2.  Let  T,  Q  be  real  Banach  spaces  with  a  continuous  bilinear  form  (•,•)  : 
T  x  Q  — >  R.  Finite-dimensional  subspaces  Jo  C  J  and  Qo  C  Q  are  a  dual  pair  if 

V/e  J 0  \  {o}, 3 g  £  g0  such  that  (/, g)  ±  0, 

V3  G  So  \  {0},  3/  e  T0  such  that  (/,  g)  ±  0. 

If  To,  Qo  are  a  dual  pair  then  dim(J'o)  =  dim(f7o)- 

Definition  2.3.  Let  T ,Q  be  real  Banach  spaces  with  a  continuous  bilinear  form  (•,•)  : 
T  x  Q  — >  R,  and  let  To  C  T  and  I/o  C  g  be  a  dual  pair.  A  bilinear  quadrature  Q  on  T  x  g 
is  exact  with  respect  to  To  x  t/o  if 

(f,9)  =  Q(f,g)  for  every  /  £T0,g  £  £0- 
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Such  a  bilinear  quadrature  evaluates  the  bilinear  form  on  J o  x  Go  exactly.  We  have  the 
diagram 

Jo  x  Go  Km  x  R" 


s(-,) 

>  ' 

R 


If  the  parent  spaces  J,  Cl  are  implied,  we  will  abuse  notation  by  referring  to  an  exact  bilinear 
quadrature  on  Jo  x  Go- 

Remark.  If  J,  G  are  infinite-dimensional  and  Q  is  exact  on  J0  x  Go,  then 

sup  \Q(f,g)  -  (f,g}\  =  oo, 
f&r.geg 

so  a  bilinear  quadrature  can  only  be  accurate  on  finite-dimensional  subspaces. 

Lemma  2.4.  Let  Jo  C  J  and  Go  C  G  be  a  dual  pair,  and  let  L\  :  J  — »  Rm,  L2  :  1/  — »  Rn  fee 
linear.  Then  there  exists  bilinear  B  :  Rm  x  R™  — >  R  such  that  the  map  Q(f ,  g )  =  B(Lif,  L2g) 
zs  an  exact  quadrature  on  To  x  C/o  */  and  only  if  Li\-po  and  L2\g0  are  both  injective. 

Proof.  Suppose  B  exists.  If  /,  /  €  Jo  are  distinct  then  there  exists  g  G  Go  such  that 

0  ?(f-  f,  9)  =  Q{f  -  f,  9)  =  B(L 1  (/  -  /),  L2g), 

so  L\f  7C  L\f  and  Lil^  is  injective.  Similarly  for  L2|g0- 

Suppose  Li\jra  and  L2 \g  are  injective.  Their  Moore-Penrose  pseudoinverses  (Li|jro)+ 
and  {L2\go)+  left-invert  Li  and  L2,  respectively.  Define  a  bilinear  map  on  Rm  x  R”  by 

B(x,y)  =  ((L1\;Fo)+x,(L2\go)+y) . 

Then,  for  all  /  G  J0,  g  €  Go, 

B{L\f,  L2g )  =  <(ii|jr0)+  iiU  /,  (i2|go)+  L2|eo  «?) 

=  (f,9)- 


□ 

From  Lemma  2.4  a  necessary  condition  for  an  exact  bilinear  quadrature  is  that  to  > 
dim  J),n  >  dim  Go-  Minimal  order  is  achieved  when  to  =  dim  Jo,  n  =  dim  C/o  and  B{x,y) 
is  uniquely  given  by 

B(x,y)  =  (s{L1\;Fa)~1x,  (L2\gQ)~1y)  . 

Exact  bilinear  quadratures  are  not  unique,  as  there  are  many  possible  linear  maps  L\,L2. 
Furthermore,  B  may  not  be  unique,  since  if  n  >  dim(J0),  then  L has  infinitely  many 
left  inverses.  Therefore,  a  method  is  needed  to  choose  among  the  infinitely  many  bilinear 
quadratures.  One  metric  of  quality  is  that,  in  addition  to  its  exactness  on  Jo  x  Go,  the 
bilinear  quadrature  also  approximates  (f,g)  for  some  set  of  g' s  outside  of  Go- 

Definition  2.5.  Let  Jo  C  J  and  Go  C  G  be  a  dual  pair,  and  let  Gi  C  G  be  another 
finite-dimensional  subspace  such  that 

Gi  C  Tq  :=  {g  S  G  :  ( f,g )  =  0  for  all  /  €  J0}. 

Let  Q  be  a  set  of  bilinear  quadratures  exact  on  Jo  x  Go-  Then  Q  €  Q  is  called  minimal  on 

Gi  if 

Q  =  arg  min  a(Q;  J0,  Gi),  (2.1) 

QeC 
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where 


cr(Q;  Jo,£i)  :=  max 


m,g)\ 


ll/IMMIe 

o^/eJ^o 

If  Q  is  minimal  on  Gi,  then  it  approximates  the  pairing  of  Tq  and  Go  ®  Gi-  Precisely,  if 
/  €  T0,  g  £  Go  ®  Gi,  and  we  write  g  =  g0  +  gi  with  <7,  £  Gi,  then 


I Q(f,g)  -  (f,g) I  =  IQ(/,ffi)|  <  ^Q;Po,Gi)\\f\\Mg- 


(2.2) 


Thus,  minimizing  a(Q;  To,  Gi)  will  improve  the  approximation. 

One  important  special  case  for  bilinear  quadratures  is  the  symmetric  case,  which  is  when 
F  =  G  is  an  inner  product  space.  In  this  case,  a  bilinear  quadrature  computes  an  orthogonal 
projection. 

Definition  2.6.  Let  To  and  Go  be  a  dual  pair  in  an  inner  product  space.  Let  {/)}  be  an 
orthonormal  basis  for  To-  Given  a  bilinear  quadrature  Q  exact  on  To  x  Go,  the  approximate 
orthogonal  projection  onto  To  arising  from  Q  is  the  linear  map  Pq  given  by 

pq{9 )  =  ^2Q{fi,9)U 


An  error  estimate  for  orthogonal  projections  similar  to  (2.2)  is  given  later  in  Theorem  2.7. 


2.2  Integral  formulation 

In  this  section,  the  bilinear  quadrature  framework  is  applied  to  the  evaluation  of  Sobolev 
inner  products  on  function  spaces.  Let  O  C  Rd  be  a  bounded  domain,  and  let  T  =  G  = 
Cr(il),  r  a  non- negative  integer,  equipped  with  a  Sobolev  inner  product 


(f,9)H-  =  E  (Daf’Da9)LHn) 

\ot\<s 

for  s  <  r. 

Choose  a  dual  pair  To,  Go  hr  T.  Exactness  on  To  x  Go  requires  linear  maps  L\  :  Cr(Cl)  — »• 
Rm,  L2  :  Cr(fl)  — ►  Rn,  and  bilinear  form  B  :  Rm  x  Rn  — ►  R  so  that  for  every  /  €  To,  g  €  Go, 
B(L1f,L2g)  =  {, f,g)H> ■ 

Appropriate  linear  maps  L\,  L2  are  pointwise  evaluations  at  particular  points  in  f 2.  Thus, 
for  the  points  x  =  (aq, . . . ,  xm)  £  f lm,  y  =  (yi, . . . ,  yn)  £  fln,  define 


’/(®l)" 

3(yi)" 

,  L2g  :=  g( y)  = 

j{xm) 

9i.Vn)_ 

Given  bases  /3  =  {/1, . . . ,  fk}  for  To  and  {<71, . . . ,  g^}  for  Go,  let  M  £  Mfcxfc  be  the  Gram 
matrix  with  entries 

(  f  'i ,  !J;j  )  Hs  • 

Since  To,  Go  are  a  dual  pair,  M  is  invertible.  Define  matrix  functions 


Fl(xi )  • 

•  fk(x  1)" 

~Gi(yi)  ■ 

■  9k{yi) 

F(x) := 

,  G( y)  ■= 

1  (*^m) 

_Gi{yn)  ■ 

■  9k{yn)_ 

To  make  L 1  and  L2  are  injective,  choose  x,y,  such  that  F(x)  and  G( y)  have  full  column 
rank.  If  B{y,w)  =  v*Ww  for  all  v,w  for  an  to  x  n  matrix  W,  then  the  bilinear  quadrature 
is  exact  if  and  only  if 

F(x)*WG(y)  =  M.  (2.3) 
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Therefore  a  bilinear  quadrature  rule 


Q(f,g)  =  f(x)*Wg(  y)  (2.4) 

evaluates  {f,g)Hs  exactly  for  any  /  E  Po,g  €  Go-  The  corresponding  approximate  orthogo¬ 
nal  projection  onto  T o  is 

k 

pQ(g)  =  J2\M*yw9(  y)]/i- 

In  the  basis  /3,  the  approximate  projection  is  computed  by 

[^Q(<?)]/3  =  F(x-TWg(y)  E  Mfc.  (2.5) 


Good  values  for  the  matrix  W  and  evaluation  points  x,  y  must  be  determined.  Without 
loss  of  generality,  suppose  that  the  bases  {/*}  and  {gj}  are  i7s-orthonormal  in  Cr(Gl).  Select 
finite-dimensional  G i  C  Cr(fl)  for  the  minimization  (2.1)  and  define  the  feasible  set  Q  to 
be  all  quadratures  of  the  form  (2.4)  satisfying  (2.3).  If  {71, . . .  ,  yp}  is  an  orthonormal  basis 
for  Gi,  define 

'71(11)  ...  7p(xi)" 


T(x) := 


E  Rnxp. 


Jl(xn)  ...  7 p(xn) 


Then  (2.1)  can  be  reformulated  as 


mino-(<3;  F0,Gi) 


min  max 
QeC  fleSi,||s||G=i 

/e^o,ll/l|p=i 


\QU,9)\ 


min  max  |6*F(x)*WT(y)a| 
x<yW  a,beRfc 

||a||2  =  ||6||2=l 


min  ex  1  (F(x)*WT(y))  subject  to  F(x)*WG(y)  =  M, 

x.y  .W 


(2.6) 


where  07(A)  is  the  leading  singular  value  of  a  matrix  A.  Minimization  (2.6)  is  independent 
of  x,  since  by  (2.3)  F(x)*W  =  ML ,  where  L  is  a  left  inverse  of  G(y).  Therefore  x  is  chosen 
by  performing  a  similar  minimization  on  the  left,  setting  an  orthonormal  basis  {A;}  for  a 
space  T\  C  Gq  ■  defining  the  corresponding  matrix  function  A(x),  and  minimizing 

min  ay  (A(x)*WG(y))  subject  to  F(x)*WG(y)  =  M,  (2.7) 

x,y,W 

where  similarly  the  dependence  of  (2.7)  on  y  may  be  dropped  since  WG(y)  is  equal  to  L*M , 
where  L  is  a  left  inverse  of  ir(x). 

In  the  symmetric  case  Tq  =  Go,  M  =  J,  Ty  =  G 1,  and  m  =  n  =  k  with  x  =  y, 
minimizations  (2.6)  and  (2.7)  are  equivalent  and  simplify  to 

mincri(F(x)_1r(x)).  (2.8) 

X 


In  subsequent  sections  special  attention  is  given  to  the  symmetric  case  because  it  is  used 
for  evaluating  orthogonal  projections. 


2.3  Error  estimates 

In  this  section,  upper  bounds  on  several  error  quantities  in  computing  an  approximate 
orthogonal  projection  of  the  form  (2.5)  are  estimated. 

Theorem  2.7  (Euclidean  norm  error  estimate).  Let  J-q .  Go  be  a  dual  pair  in  an  inner 
product  space  T  and  Q  a  bilinear  quadrature  of  the  form  (2.4)  that  is  exact  on  J71  x  Go- 
Let  Pq  be  the  approximate  orthogonal  projection  onto  Pq  arising  from  Q  with  coordinate 
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representation  (2.5).  If  P  is  the  exact  orthogonal  projection  operator  onto  P o,  Q\  C  Jyj1, 
and  g  =  g0  +  g\  £  Q0  ®  Gi  such  that  gt  £  Gi, 


\\[PQ(g)  -  P(g)]f 3II2  <  v(Q;Po,Gi)\\gi\\, 

where  ||  •  || 2  is  the  Euclidean  norm. 

Proof.  This  is  essentially  the  same  as  (2.1).  Since  (fi,g)  =  Q{fi,go),  then 


1/2 


\\[Pq(9)  -  P(g)b II2  =  E  \QVi’9)  -  (fi,g) r 


\i=l 

'  k 


1/2 


=  [Yl\Q{fi,g)-Q{fi,go)\2 

\i= 1 

/  fc  x  1/2 


1  ' 

=  m^x  77  rp  E'  OiiQ{fi,gi), 

IMh 

where  a  =  (af)  £  Mfc.  Each  f  £  IF, 0  can  be  written  as  /  =  J2i  aifi >  so 

k 


max  ti — 77-  OLiQ(fi.gi)  =  max 
a^o  a2^  w  oa/s^-0 

2  =  1 


Q(f,g  1) 


<ff(Q;^b,ai)||ffi||. 


(2.9) 


□ 


Theorem  2.7  provides  an  error  bound  for  an  approximate  orthogonal  projection  when 
the  projected  function  g  is  in  Go®Gi-  If  Pq  is  a  space  of  polynomials,  then  it  is  also  useful 
to  obtain  an  error  estimate  that  depends  on  the  regularity  of  g. 

Theorem  2.8  (Uniform  norm  error  estimates  for  polynomials).  Let  P  =  C(£i)  with  U  C 
a  hounded ,  convex  domain,  equipped  with  the  L 2  inner  product.  Let  Pq  be  the  set  of 
multivariate  polynomials  of  degree  at  most  n  with  an  orthonormal  basis  (3  =  {fi},  let  P  : 
P  — ►  P  be  the  orthogonal  projection  onto  Pq,  and  suppose  Pq  is  an  approximate  orthogonal 
projection  onto  Pq  with  coordinate  representation  (2.5).  There  exist  a  constant  C  >  0  such 
that  for  every  g  £  Cn+1(fl),  then 

II [Pg  -  PQg\p lloo  <  C\\Dn+1g\\LaB,  (2.10) 

where 

\\Dn+1g\\L~  :=  E  max \Dag(x)\. 

|a|=n+l 

Proof.  Using  (2.5)  and  the  exactness  of  Pq  on  Pq,  then  writing  g  =  Pg+  (/  —  P)g  =  go  +  gi, 
we  have 


\\[Pg-PQgMoo  =  \\F*Wg1(x)\\00 

<  \\F*W\\  00— >-oo  \\(l-P)g\\co 

<  ||.F*  Wjloo—^-oo  ||  (/  P){g  (?)||c70) 

where  q  is  any  element  of  Pq  and  ||  •  ||oo->-oo  is  the  induced  matrix  00  norm.  Then 

llt^S  -  PQg\, 3II00  <  ||^W||oc^oc(l  +  ||J°||c70_oo)||ff  -  gllco, 
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where  the  C°  operator  norm  of  P  is  given  by 


\\P\\ 


c°- 


>c  o  =  max 

x€zQ, 


dt. 


By  the  Deny-Lions/Bramble-Hilbert  lemma  [EG04],  for  all  g  £  C'rl+1(fl)  there  exists  a 
constant  Cbh  >  0  (dependent  on  n  and  Cl)  such  that 


inf 

q&Fa 


\\g  -  q\\c°  <  CBH\\Dn+  g\\L 


which  combined  with  the  previous  inequality  yields  the  desired  result  with 


C  =  ||-F*W/||00_).00(1  +  \\P\\c°^c°)Cbh- 


□ 

In  the  presence  of  round-off  error  in  function  evaluation,  the  conditioning  of  an  approx¬ 
imate  orthogonal  projection  is  also  important  to  quantify. 

Theorem  2.9.  Let  Pq  be  an  approximate  orthogonal  projection  of  the  form  (2.5).  If  5g{ y) 
is  the  absolute  error  in  computing  g{ y)  and  SPQ(g)  is  the  resulting  projection  absolute  error, 
then  with  respect  to  a  vector  norm  ||  •  ||, 

\\[SPQ(gM  ll*g(y)l| 
ll[JPQ(s)]/i||  “  ll<?(y)ll  ’ 

where  k  =  k(F*  fx)W)  is  the  matrix  condition  number  with  respect  to  ||  •  ||. 

2.4  Classical  and  bilinear  quadratures  on  univariate  polynomials 

In  this  section  we  review  Gaussian  quadratures  and  show  they  are  a  special  case  of  a  bilinear 
quadrature  in  one  dimension.  We  then  propose  a  way  to  generalize  to  quadratures  evaluating 
inner  products  of  polynomials  on  multidimensional  domains. 

Definition  2.10.  Let  C  be  a  connected  domain.  A  classical  quadrature  q  of  order  n 
on  Cl  is  a  linear  functional  defined  by  a  set  x  =  (aq, . . .  ,xn),  Xi  £  Cl,  called  the  nodes ,  and 
a  vector  w  £  M™,  whose  components  are  called  the  weights,  such  that  for  any  f  £  C(Cl), 

n 

?(/)  =  w*/(x)  =  wif(xi)- 

i= 1 

Furthermore,  if  .Fo  is  a  subspace  of  C(n)  and  g  is  a  Borel  measure,  then  q  is  said  to  be 
exact  on  Fq  if 

?(/)  =  [  fdg 
Jn 

for  all  /  e  Jo- 

Let  Pn  be  the  space  of  univariate  polynomials  of  degree  up  to  n,  I  an  open  interval,  and 
g  a  finite  absolutely  continuous  Borel  measure  on  /. 

Definition  2.11.  Suppose  P2n-i  is  /i-integrable  on  I.  Then  a  Gaussian  quadrature  of  order 
n  on  I  is  a  classical  quadrature  of  order  n  on  I  that  is  exact  on  P2n-i  with  respect  to  g. 

The  advantages  and  disadvantages  of  the  theory  of  quadratures  for  polynomials  are 
rooted  in  existence  and  uniqueness  result  for  Gaussian  quadratures. 

Theorem  2.12.  Suppose  P2n-i  is  g-integrable  on  I,  and  let  {4>k}  denote  any  set  of  L2(I,  g)- 
orthonormal  polynomials  such  that  deg (</>&)  =  k.  Then  the  following  sets  are  equal: 
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1.  The  zeros  of  <pn. 

2.  The  eigenvalues  of  the  symmetric  bilinear  form  on  Pn_i  given  by 

B(f,g)  '■=  j i  xf  (x)g(x)  dg  =  {xf(x),g(x))L2{I^  . 

3.  The  nodes  {xi}  of  a  Gaussian  quadrature  of  order  n  on  I. 

Proof.  (1  -4=f>  2)  Since  B(-,-)  is  symmetric  it  is  diagonalizable  with  n  real  eigenvalues  {A.;}. 
If  a  polynomial  ipi(x)  is  an  eigenvector  for  A.;,  then  for  0  <  j  <  n  —  1, 

(xipi{x),(!>j(x))  =  A i{<ipi(x),<j)j(x))  ==>  {(x-X i)ipi(x),(j)j(x))  =0. 

Since  (x  —  A i)ifi{x)  £  P n  for  each  i,  and  the  only  polynomials  in  P „  that  are  orthogonal  to 
each  of  <j>o,  ■■■ ,  4>n- 1  are  multiples  of  <fn,  then  each  (x  —  Xf)  is  a  factor  of  <j>n(x).  Thus  4>n(x) 
is  a  multiple  of  (x  —  Ai) . . .  (x  —  An)  and  its  zeros  are  the  eigenvalues  of  £?(•,  ■). 

(2  <f=>-  3)  Suppose  a  Gaussian  quadrature  with  weights  {m^}  and  nodes  {x^}  exists. 
With  respect  to  the  basis  of  orthonormal  polynomials  {4>k},  the  bilinear  form  B(-,-)  has  a 
symmetric  matrix  represention  B  with  entries  given  by 

/n 

x<j>i(x)(j>j(x)  dg  =  Ywkxk<t>i{xk)<t>j{xk). 

fe= l 


If 


x/WkMxk) 

Xi 

Uk  = 

,x  = 

yjwk&n—l  (Xk) _ 

Xn_ 

then 

n 

B  =  YxkUkul  =  UXU*.  (2.11) 

fe= l 

Since  Sij  =  Y^7k=iwk<t>i(xk)4>j{xk),  then  /  =  UU*  and  U  is  a  unitary  matrix.  Then  (2.11) 
is  the  unitary  diagonalization  of  the  symmetric  matrix  B  with  eigenvalues  given  by  the 
Xfc’s.  if 

Remarks.  Theorem  2.12  shows  that  if  a  Gaussian  quadrature  of  order  n  exists,  its  nodes  are 
the  zeros  of  f>n .  The  existence  proof  is  completed  by  showing  the  weights  exist  and  satisfy 

n 

l/m  =  5X*,))8. 

0=0 

Therefore,  taking  the  square  root  ^Jwk  is  legitimate  [DR84].  Theorem  2.12  also  provides  an 
efficient  method  to  construct  these  quadratures.  The  matrix  B  in  (2.11)  is  tridiagonal,  so 
its  eigenvalues  can  be  calculated  quickly,  even  for  very  large  n  [GW69]. 

Gaussian  quadrature  is  optimal  for  integrating  polynomials  on  an  interval,  but  does  not 
extend  readily  to  higher-dimensional  domains.  The  zeros  of  a  multivariate  polynomial  are 
generally  not  isolated  (consider  f(x,y)  =  xy)  so  they  cannot  all  be  used  as  nodes  of  a 
classical  quadrature.  Additionally,  the  connection  between  nodes  and  eigenvalues  no  longer 
holds  since  the  eigenvalues  are  only  scalars  (the  connection  extends  to  two-dimensional 
domains  with  complex  eigenvalues  [VR14]). 

Bilinear  quadratures  make  sense  in  any  dimension,  yet  contain  Gaussian  quadrature  as 
a  special  case.  Consider  a  classical  quadrature  with  nodes  x  =  {x,}  and  weights  w  =  {mi}. 
If  a  function  h(x)  =  f(x)g(x)  with  /,  g  belonging  to  function  spaces  J7,  Q  respectively,  then 
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the  classical  quadrature  q  evaluated  on  h  is  the  same  as  a  bilinear  quadrature  Q  on  /,  g 
given  by 

Wl 


Q(/,<?)=/(x)* 


g(x)  =  w  *h{x)  =  q{h). 


w 


n 


The  matrix  W  is  diagonal  with  entries  given  by  the  weights  of  the  classical  quadrature. 
Thus  for  a  general  bilinear  quadrature  of  the  form  (2.4)  the  entries  of  W  can  be  viewed  as 
analogues  of  the  weights. 


Theorem  2.13.  The  nodes  of  a  Gaussian  quadrature  of  order  n  are  the  same  as  the  points 
x  in  the  unique  bilinear  quadrature  of  order  (n,n)  on  Pn_i  x  Pn-i  that  is  minimal  on 
span{^n},  where  <pn  is  the  orthonormal  polynomial  of  degree  n.  Furthermore,  the  matrix 
W  in  (2.4)  is  a  diagonal  matrix  whose  diagonal  entries  are  the  weights  of  the  Gaussian 
quadrature. 


Proof.  Let  <j>o, . . .  ,<j>n—i  be  the  orthonormal  polynomials  up  to  degree  n  —  1  such  that 
degcfk  =  k,  x  =  (xi, . . . , xn)  £  12",  and  define 


$(x)  = 


4>o{Xl)  ...  <f>n-l(xi) 


4*0  (Xn)  ■  ■  ■  fyn—liXn) 

Then  the  minimization  problem  (2.8)  becomes 

4>n(xi) 


min  rr i  <l>(x) 
xGQ™  1 


0n (^n) 


This  is  uniquely  minimized  (up  to  reordering  of  the  xf  s)  when  x  is  the  set  of  zeros  of  (j>n 
in  which  case  <j\  =  0.  The  corresponding  bilinear  quadrature  Q  exactly  evaluates  products 
where  one  polynomial  has  degree  n  —  1  and  the  other  has  degree  n.  Then  Q  has  the  same 
evaluation  points  as  a  bilinear  quadrature  formed  from  the  Gaussian  quadrature.  Since  W 
is  unique,  then  it  must  be  equal  to  the  diagonal  matrix  with  entries  given  by  the  weights  of 
the  Gaussian  quadrature.  □ 

The  above  result  suggests  that  a  good  way  to  accurately  compute  inner  products  of  poly¬ 
nomials  on  a  multidimensional  domain  is  to  utilize  a  symmetric  bilinear  quadrature  that  is 
exact  on  P„  x  Pra  and  minimal  on  P„+i  ClP^.  Just  as  Gaussian  quadratures  accurately  inte¬ 
grate  nearly  polynomial  functions  accurately,  bilinear  quadratures  constructed  in  the  above 
manner  are  expected  to  evaluate  inner  products  of  nearly  polynomial  functions  accurately. 
Numerical  results  for  these  quadrature  are  shown  in  section  3. 


2.5  Classical  and  bilinear  quadratures  on  trigonometric  polynomi¬ 
als 

For  the  space  of  trigonometric  polynomials 

Tr j_i  =  spanjl,  sinx, . . . ,  sin  (n  —  l)x,  cosx, . . . ,  cos  (n  —  l)x}, 


it  is  known  that  the  ( n  +  l)-point  trapezoidal  rule 


Tra(p)  :=  -p(0)  +  -p{2n)  +  —  ^  p  f 
n  n  n  ^ '  V 

.7=1  x 


2t Tj 
n 
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is  exact  for  integrating  all  p  £  T„_ i  over  the  interval  [0,  27t] .  Since  T„_i  is  a  rotationally- 
invariant  function  space  on  the  circle  R/27tZ,  the  trapezoidal  rule  yields  a  family  of  n-point 
classical  quadratures  for  T„_i  given  by 


/*27t  If _ ^  27f 

/  p(x)  dx  =  —  ’S~~]p(xj)  for  all  p  £  Tn_i,  Xj+ 1  —  =  — . 

Jo  n  ]=0  n 


(2.12) 


When  n  is  odd,  the  above  trapezoidal  rule  quadrature  is  a  special  case  of  a  bilinear  quadra¬ 
ture: 


Theorem  2.14.  Let  n  >  0  be  an  odd  integer.  Then  the  set  of  classical  quadratures  on  Tn_ i 
in  (2.12)  are  equivalent  to  symmetric  bilinear  quadratures  of  order  (n,  n)  on  Ttn-i)/2  x 
T(n-i)/ 2  that  are  minimal  on  spanjsinnx,  cosnx}. 


Proof.  Set  n  =  2fc  +  1.  Since  Tn_  i  is  rotationally  invariant  on  the  circle,  then  if  Q  is 
a  symmetric  bilinear  quadrature  on  Tk  x  of  the  form  (2.4),  a(Q)  is  invariant  under 
rotations  of  the  evaluation  points  x  =  (xi, . . . ,  xn).  Therefore  without  loss  of  generality 
X\  =  0 ,Xj  £  [0,  27t).  Define 


F(x) 


1 

1 

1 

1  1 

e~ikx  2 

pikx  2 

e~i(k+l)x2 

ei(k+l)x2 

^—ikxn 

gikxn 

g-i(fc+l)x„ 

ei(k+l)xn 

and  it  suffices  to  prove  that  choosing  Xj  =  2-7T j/n  solves  the  minimization  problem  (2.1). 

If  Xj  =  2irj/n,  then  the  first  column  of  F(x)  is  the  second  column  of  T(x),  and  the  last 
column  of  F(x)  is  the  first  column  of  T(x).  Therefore,  in  this  case,  J^(x)_1r(x)  =  \en  ei] , 
where  ey-  is  the  j-th  standard  coordinate  vector,  hence  cri(F(x)_1r(x))  =  1. 

We  claim  that  for  any  choice  of  nodes  x  £  [0,  27r)"  with  X\  =  0, 

ai(F(x)"1r(x))  >  1.  (2.13) 


If  (2.13)  is  established,  then  setting  Xj  =  2ir(j  —  1  )/n  yields  a  minimal  quadrature  in  Q. 
To  show  this,  let  u  be  the  first  column  of  F(x)_1r(x).  We  will  show  that  || u|| 2  >  1,  from 
which  (2.13)  follows.  The  column  u  satisfies  the  equation 


F(x)u 


1 

\/27r 


1 

a—i(k+  l)x2 


0—i(k+  l)x„ 


which  is  equivalent  to  the  Vandermonde  system 


1  1 

1 

1 

1  e“2  . 

ez(n—  l)x2 

u  = 

e~ix2 

1  e“"  . 

ei(n—l)xn 

Setting  Zj  =  elXj ,  then  the  entries  of  u  are  the  coefficients  of  a  degree  n  —  1  complex 
polynomial  p(z)  such  that  p(zj)  =  1  / Zj.  Setting  q(z)  =  zp(z ),  it  suffices  to  find  a  degree  n 
polynomial  q(z)  such  that  q( 0)  =  0  and  q(zj)  =  1.  Such  a  q  is  unique  and 

n 

q(z)  = 1  -  II  t1  ~  z!z j)  • 

i=i 

Then  the  leading  coefficient  of  q ,  which  is  also  the  leading  coefficient  of  p ,  has  absolute  value 
1,  and  hence  || u|| 2  >  1-  □ 
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Remark.  The  trapezoidal  rule  uniquely  generates  a  minimal  bilinear  quadrature,  since  in 
that  case  q(z)  =  azn  for  some  |a|  =  1.  If  Xj’ s  are  not  equispaced,  q(z)  has  some  nonzero 
lower-order  coefficients. 

2.6  Lobatto  quadrature  and  the  non-invertible  case 

While  minimizing  the  number  of  evaluation  points  will  reduce  the  cost  of  evaluating  a 
quadrature,  it  may  be  advantageous  to  use  more  points  than  is  optimal  in  order  to  improve 
accuracy.  One  example  is  Lobatto  quadratures  for  polynomials  of  one  variable,  which  use 
more  points  than  Gaussian  quadratures.  In  this  section,  we  observe  Lobatto  quadratures 
are  a  special  case  of  a  bilinear  quadrature  where  extra  evaluation  points  are  used,  in  which 
case  the  matrix  function  F(x)  is  non-invertible.  The  formulation  of  Lobatto-like  bilinear 
quadratures  on  general  domains  is  given. 

Definition  2.15.  Let  P2n_i  be  /r-integrable  on  an  interval  I  =  [a,  b\.  Then  the  correspond¬ 
ing  Lobatto  quadrature  is  a  classical  quadrature  of  order  n  +  1  exact  on  P2rl_i  with  respect 
to  p  such  that  if  xo,  •  ■ . ,  xn  are  the  nodes,  then  Xq  =  a  and  xn  =  b. 

Theorem  2.16.  Suppose  P2n_i  is  p-integrable  on  I ,  and  let  {</>&}  denote  the  unique  set 
of  orthonormal  polynomials  such  that  deg (<j>k)  =  k.  Then  there  exists  a  unique  Lobatto 
quadrature  of  order  n  +  1,  and  the  interior  nodes  {xi  :  1  <  i  <  n  —  1}  are  the  zeros  of 
ic^). 

A  Lobatto  quadrature  of  order  n  +  1  corresponds  to  a  symmetric  bilinear  quadrature 
that  is  exact  on  P„_i  x  Pn_i  and  minimal  on  Pra  in  which  the  W  matrix  is  diagonal  and 
the  matrix  F(x)  is  given  by 


Ma) 

Ma) 

<t>l(xi) 

Mx  i) 

<Pl{xn- 1) 

(xn— i 

Mb) 

4>k(b) 

Unlike  in  the  Gaussian  quadrature  case,  the  matrix  F  =  F(x)  is  not  square,  so  there 
exists  infinitely  many  matrices  W  satisfying  (2.3).  Therefore,  the  simplified  minimization 
condition  (2.8)  cannot  be  employed,  and  one  must  optimize  over  both  the  quadrature  nodes 
x  and  matrices  W.  In  general,  suppose  F  is  m  x  k  and  G  is  n  x  k:  both  with  full  column 
rank.  Then  all  matrices  W  satisfying  (2.3)  are  of  the  form 

W  =  {F*)+MG+  +  Y  —  FF+YGG+,  (2.14) 

where  Y  is  an  arbitrary  mxn  matrix.  In  the  symmetric  case  F  =  G  and  M  =  /,  a  minimal 
bilinear  quadrature  is  found  through  the  unconstrained  minimization 

Find  Y  e  Mmxm  and  x  minimizing  ax  (F+T  +  F*Y{I  -  FF+)T)  (2.15) 

While  computationally  more  expensive,  this  optimization  procedure  can  be  used  to  compute 
symmetric  Lobatto-like  bilinear  quadratures.  First  fix  points  Xq  that  the  bilinear  quadrature 
is  required  to  use,  then  construct  the  (typically  non-square)  matrix  function  F(x o,  x),  where 
only  the  points  x  are  varying.  Then  minimize  according  to  (2.15).  This  procedure  is 
applicable  for  arbitrary  domains  fl,  any  space  of  continuous  functions,  and  any  inner  product 
on  that  space. 

2.7  Change  of  variables 

For  a  bilinear  quadrature  computing  an  L2(f2)  inner  products,  a  bilinear  quadrature  can  be 
cheaply  constructed  for  L2(<f>(fl))  inner  products,  where  $  is  an  affine  invertible  change  of 
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variables.  For  continuous  functions  _/)  on  12,  set 

fi(®(x))  =  fi(x)\  det(D<f>)|_1/2. 

Then 

/  fi(y)fj{y)dy  =  /  fi(x)fj{x)dx  =  (fi,fj)L2( n). 
if! 

The  Jacobian  Z2<L  is  constant  when  $  is  affine,  so  if  the  bilinear  quadrature  on  L2(12)  exact 
on  Jo  x  £/0  is 

Q(f,g)  =  f(x)*Wg{y), 

a  bilinear  quadrature  on  L2(<h(12))  for  (J0  o  d)”1)  x  (Q0  o  is  given  by 

Q(f,g)  =  fmx)yw~g(<t>(  y)),  W  =  fF|det(J$)|-i.  (2.16) 

For  an  Hs  inner  product  with  s  >  0,  in  general  a  new  bilinear  quadrature  cannot  be 
cheaply  constructed  under  a  change  of  variables.  However,  when  $(x)  =  A Ux  +  b  is  affine 
with  A  G  R  and  U  a  unitary  matrix,  a  change  of  variables  can  still  be  performed  at  low 
cost.  Let  W  be  the  matrix  in  a  bilinear  quadrature  of  form  (2.4)  computing  H1( 12)  inner 
products.  Then  write  W  =  Wo  +  Wi,  where  Wo  is  the  matrix  for  a  bilinear  quadrature  that 
computes  L2(12)  inner  products.  Then  a  new  bilinear  quadrature  for  H1  (<h(12))  is  formed 
with  matrix 

W=  |A|_1Wo  +  |A|-3Wi 
and  evaluation  points  mapped  by  <F 

3  Computation 

In  this  section,  a  basic  numerical  procedure  to  produce  symmetric  bilinear  quadrature  rules 
is  described.  Afterward,  some  numerical  examples  of  bilinear  quadrature  rules  are  presented. 

3.1  Orthogonalization 

For  a  function  space  Jo,  one  may  initially  have  a  numerical  routine  to  evaluate  (up  to 
machine  precision)  basis  functions  tpi, . . .  for  Jo  that  are  not  orthonormal.  Assuming 
that  the  inner  products  (ij,  jj)  =  ALy  can  be  computed  exactly,  J(x)  is  computed  from 
'F(x)  and  Gram  matrix  M  by 

1.  Compute  the  lower  triangular  matrix  L  in  the  Cholesky  factorization  M  =  LL*. 

2.  For  a  given  x,  perform  a  lower-triangular  solve  on  the  matrix  equation  *F(x)*  =  LZ. 

3.  Set  J(x)  =  Z*. 

The  same  procedure  can  be  used  to  produce  an  orthonormal  basis  for  the  function  space  Ji 
that  the  bilinear  quadrature  is  minimized  against. 

3.2  Nonlinear  optimization 

For  the  invertible  symmetric  case  we  have  reduced  our  problem  to  the  minimization  problem 
(2.6): 


Find  x  minimizing  di  (J(x)  xr(x))  . 

This  is  a  nonlinear  optimization  problem  in  d  ■  k  variables,  where  d  is  the  dimension  of  the 
integration  region  Q  and  k  =  dim(Jo). 

The  problem  of  minimizing  the  largest  singular  value  of  a  matrix  function  A(x)  is  equiv¬ 
alent  to  minimizing  the  largest  eigenvalue  of  the  symmetric  positive  semidefinite  matrix 
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A*(x)A(x).  This  type  of  the  eigenvalue  optimization  problem  has  been  extensively  studied 
in  its  own  right;  see  [OW95]  [SF95]. 

Often  F(x)  and  T(x),  but  not  their  derivatives,  can  be  accurately  computed.  Also,  the 
multiplicity  of  the  largest  singular  values  are  generally  unknown.  Consequently,  a  quasi- 
Newton  method  is  ideal  for  the  optimization  procedure.  The  objective  function  is  non- 
convex  and  typically  has  multiple  local  minima,  so  the  optimization  procedure  is  run  with 
many  initial  guesses.  Furthermore,  in  the  presence  of  many  nearby  local  minima,  after 
each  convergent  result,  the  computed  points  can  be  perturbed  by  a  small  value  <5  and  the 
procedure  run  again  with  perturbed  points  as  another  initial  guess.  This  is  repeated  until 
suitable  convergence.  While  this  procedure  may  be  expensive,  computing  a  quadrature 
is  typically  a  one-time  cost,  after  which  the  quadrature  can  be  used  repeatedly  for  its 
applications. 

In  our  numerical  experiments,  we  employ  a  quasi-Newton  method  with  BFGS  updates 
as  implemented  as  part  of  Matlab’s  fminunc  routine  [Bro70,  Fle70,  Gol70,  Sha70].  Since 
F(x)-1T(x)  is  a  small,  dense  matrix,  its  norm  is  computed  by  calculating  its  full  SVD. 
Up  to  105  initial  random  points  uniformly  distributed  across  the  domain  are  used,  and  the 
procedure  is  iterated  until  convergence  in  double-precision  arithmetic. 

For  our  numerical  implementation  we  do  not  reinforce  the  constraint  that  the  evaluation 
points  Xi  remain  in  the  integration  domain  Q.  While  in  general  the  full  constrained  mini¬ 
mization  problem  may  be  necessary,  we  have  empirically  observed  that  it  is  not  necessary 
for  quadratures  on  polynomials.  This  can  be  explained  by  observing  that  the  orthogonal 
polynomials  grow  rapidly  outside  of  U;  thus  points  outside  the  domain  are  not  expected  to 
be  good  candidates  for  the  solution  to  the  minimization  problem. 

Remarks.  In  the  case  of  polynomials  it  is  possible  to  accurately  compute  the  gradients  of 
F(x)  and  T(x),  in  which  case  a  quasi-Newton  method  may  be  unnecessary.  The  BFGS 
method  has  been  chosen  since  it  is  robust  for  different  function  spaces. 

3.3  Bilinear  quadratures  on  triangular  domains 

In  practical  applications  one  of  the  most  important  cases  to  consider  is  the  L 2  product 
of  polynomials  on  a  simplex.  For  example,  in  the  finite  element  method  one  typically 
solves  a  two-dimensional  PDE  locally  on  polynomials  supported  on  triangular  domains.  The 
discretization  requires  computing  a  number  of  inner  products.  In  this  section  we  compute 
bilinear  quadratures  that  are  exact  on  polynomials  on  a  triangular  domain. 

Because  the  space  of  polynomials  is  affine-invariant  it  suffices  to  find  evaluation  points 
for  polynomials  on  a  reference  triangle.  Given  a  bilinear  quadrature  on  a  reference  triangle  a 
bilinear  quadrature  for  polynomials  on  any  other  triangle  can  be  cheaply  obtained  using  the 
change  of  variables  formula  (2.16).  A  basis  of  orthogonal  polynomials  on  the  right  triangle 
with  vertices  (—1,  —1),  (—1, 1),  (1,  —1)  is  given  by 

Km,n^y)  =  {2X l-y  0  (3-1) 

where  Pm  is  the  roth  Legendre  polynomial  and  P“,/3  is  the  nth  Jacobi  polynomial  with 
parameters  cr,/3.  These  functions  can  be  computed  efficiently  and  stably  as  in  [XG10]. 

Using  this  basis,  symmetric  bilinear  quadratures  exact  for  the  L 2  inner  product  over  this 
right  triangle  on  J- o  =  Pn  and  minimal  on  T\  =  Pn+i  n  P^  were  computed.  The  minimal 
number  of  evaluation  points  were  used,  in  which  case  the  number  of  points  required  is 

k  =  dim(Pn)  =  ^  . 

In  Table  1,  for  each  computed  bilinear  quadrature  rule,  the  minimized  largest  singular  value 
a  =  cri(F(x)-1T(x))  is  given,  along  with  the  oo-norm  condition  number  of  the  matrix  for 
the  approximate  orthogonal  projection. 
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n 

k 

a 

Koo  (F*W) 

0 

1 

0.00000 

1.00000e+0 

1 

3 

0.14507 

2.82218e+0 

2 

6 

0.30373 

6.29185e+0 

3 

10 

0.47762 

1.15455e+l 

4 

15 

0.65817 

2.03810e+l 

5 

21 

0.78394 

3.39955e+l 

6 

28 

0.87930 

4.71065e+l 

7 

36 

0.95305 

8.48889e+l 

8 

45 

1.05595 

1.09107e+2 

Table  1:  Numerical  results  for  fc-point  bilinear  quadratures  on  Pn  x  Pn  for  L 2  on  the  interior  of  the 
reference  right  triangle. 


Figure  1:  Evaluation  points  for  bilinear  quadratures  on  x  Pn  for  L 2  on  the  interior  of  an 
equilateral  triangle,  for  n  =  4,8. 


In  Figure  1,  the  evaluation  points  of  two  bilinear  quadrature  rules  on  the  equilateral 
triangle  are  shown.  Notice  that  the  points  possess  some  symmetries.  The  expectation 
that  quadrature  points  for  polynomials  should  have  some  symmetries  has  been  exploited 
in  the  past  to  reduce  the  complexity  of  searching  for  classical  quadratures  [XG10].  In  the 
quasi-Newton  method  used  to  solve  (2.8),  however,  no  symmetry  conditions  were  explicitly 
enforced. 


3.4  Numerical  accuracy  of  quadratures  on  triangles 

In  the  section  the  computed  bilinear  quadratures  on  triangles  are  compared  against  existing 
high-order  classical  quadrature  schemes  on  triangles  in  the  setting  of  orthogonal  projections. 
Given  the  space  To  =  P„  on  Q  with  L2-orthonormal  basis  ft  =  {/*},  orthogonal  projection 
operator  P  onto  To,  and  given  g  G  C°°(fl),  we  wish  to  compute 


[Pab 


(/ij3)l2 


(fk,g)Li 


The  column  vector  [Pg\p  can  be  computed  using  either  an  approximate  orthogonal  projec¬ 
tion,  or  a  classical  quadrature  for  each  entry  Jfl  fig. 

Since  the  approximate  orthogonal  projection  matrix  F*W,  weights  of  the  classical  quadra¬ 
ture,  and  locations  of  evaluation  points  are  all  precomputed,  the  flop  cost  for  each  method 
is  solely  determined  by  the  number  of  evaluation  points  needed.  The  28-point  bilinear 
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Pb 

P' 

C 

TP 

Dunavant 

9.38e-14 

3.97e-01 

4.97e-05 

9.06e-03 

Xiao  /  Gimbutas 

3.29e-15 

2.73e-01 

1.91e-05 

4.74e-03 

Bilinear 

3.92e-15 

3.99e-15 

6.74e-06 

1.71e-03 

Table  2:  Average  fI2-norm  relative  error  in  computing  approximate  orthogonal  projection  coefficients 
onto  IP §  for  four  different  sets  of  functions  using  three  methods  that  use  28  function  evaluations. 


quadrature  as  shown  in  Figure  1  was  utilized.  For  comparison  we  chose  two  different  28- 
point  classical  quadratures,  each  exact  on  polynomials  of  degree  up  to  11,  due  to  Dunavant 
[Dun85]  and  Xiao  and  Gimbutas  [XG10],  respectively.  These  quadratures  were  computed 
using  the  libraries  available  from  [Burl5].  Both  classical  quadratures  were  similarly  trans¬ 
formed  to  an  equilateral  triangle  of  side  length  1. 

For  our  numerical  experiments,  we  draw  the  projected  function  g  from  four  different 
probability  distributions  of  functions,  which  we  denote  by  Pg,Pg,  and  TP. 

We  define 

K  ■=  {.9  e  P„  :  IIsIIl2  =  1}, 

with  probability  measure  given  by  drawing  a  random  vector  of  coefficients  uniformly  in 
[—  l,l]fc,  and  then  normalizing  the  coefficients  to  have  H2-norm  1,  and  using  those  as  the 
Fourier  coefficients  on  the  orthonormal  polynomials  on  the  triangle. 

The  set  C  contains  smooth  functions  with  slow  decay,  and  is  defined  by  functions  of  the 
form 

9^y)=  1  +  {aiX  +  a2y)2’ 

where  a  =  (01,02)  is  drawn  uniformly  from  the  unit  circle. 

The  set  TP  contains  smooth  non-polynomial  functions  with  oscillations,  and  has  ele¬ 
ments  of  the  form 

g{x ,  y)  =  eaiX+a2V  cos(4&!2;  +  4 b2y)p{x,  y), 

where  parameters  (01,02)  and  (&i,&2)  are  both  drawn  uniformly  from  the  unit  circle,  and 
p(x,  y)  is  a  random  element  of  P'2  with  L 2  norm  1  as  chosen  in  the  same  manner  as  for  the 
first  two  cases. 

For  each  randomly  chosen  function  g,  we  computed  the  column  vector  [PQg\p  using  the 
three  quadrature  methods.  The  exact  value  [Pg\p  was  computed  with  a  295-point  classical 
quadrature  that  exactly  integrates  polynomials  up  to  degree  40,  as  computed  in  [XG10]. 
The  £2  norm  relative  error  was  averaged  over  104  randomly  generated  g  for  each  of  the  four 
classes  of  functions.  The  resulting  average  relative  errors  are  shown  in  Table  2. 

On  Pg,  all  three  quadrature  rules  achieve  very  high  accuracy,  with  the  Dunavant  quadra¬ 
ture  losing  one  digit  of  accuracy  and  both  Xiao/Gimbutas  and  bilinear  quadratures  correctly 
computing  the  orthogonal  projection  up  to  double  precision.  This  is  expected  since  all 
quadratures  are  designed  to  integrate  such  polynomial  functions  exactly. 

On  Pg,  neither  classical  quadratures  are  accurate  to  full  precision  because  both  classical 
quadratures  are  only  capable  of  exactly  integrating  polynomials  of  degree  up  to  11.  Since 
the  bilinear  quadrature  can  exactly  integrate  Pg  x  P6,  it  has  mean  error  on  the  order  of 
machine  precision. 

On  the  sets  C  and  TP,  none  of  the  quadratures  are  accurate  to  machine  precision  since 
none  of  the  functions  are  polynomials.  However,  the  bilinear  quadrature  achieves  better 
accuracy  than  the  classical  quadratures  despite  having  the  same  number  of  evaluation  points. 

The  existing  classical  quadratures  are  already  very  good,  integrating  non-polynomial 
functions  from  C  and  TP  with  several  digits  of  accuracy.  Additionally,  the  classical  quadra¬ 
ture  of  Xiao/Gimbutas  performs  better  than  the  Dunavant  quadrature  in  all  four  cases. 
However,  the  bilinear  quadrature  was  as  good  or  better  than  the  classical  quadratures  in 
each  case,  despite  using  the  same  number  of  evaluations.  This  result  is  explained  by  the  fact 
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n 

k 

a 

Koo  (F*W) 

0 

1 

0.00000 

1.00000e+0 

1 

3 

0.67739 

2.91852e+0 

2 

6 

0.79523 

7.50137e+0 

3 

10 

0.92888 

1.17526e+l 

4 

15 

0.97590 

2.68367e+l 

5 

21 

0.99701 

3.14417e+l 

6 

28 

1.00066 

6.42937e+l 

7 

36 

1.00711 

7.34237e+l 

8 

45 

1.00784 

1.03464e+2 

Table  3:  Numerical  results  for  fc-point  bilinear  quadratures  on  Pn  x  Pn  for  L2  on  the  interior  of  a 
square. 


Figure  2:  Evaluation  points  for  bilinear  quadratures  on  Pn  x  Pn  for  L 2  on  the  interior  of  a  square, 
for  n  =  4,  8. 


that  bilinear  quadratures  are  specifically  designed  for  the  orthogonal  projection  problem, 
while  classical  quadratures  are  designed  for  evaluating  a  linear  functional. 

3.5  Bilinear  quadratures  on  other  domains 

In  this  section  bilinear  quadratures  for  L 2  inner  products  of  polynomials  on  the  interiors 
of  a  square  and  a  circle  are  computed.  We  observe  that,  just  as  in  the  case  of  triangles, 
minimizing  according  to  (2.8)  produces  well-behaved  evaluation  points. 

For  the  case  of  the  square  domain  [ — 1,  l]2,  orthogonal  polynomials  are  Pn{x)Pm(y), 
where  Pn  is  the  nth  Legendre  polynomial.  Table  3  shows  the  minimized  leading  singu¬ 
lar  value  er  and  matrix  condition  number  re0 0  for  several  fc-point  bilinear  quadratures  on 
the  square.  Interestingly,  the  evaluation  points  on  the  square  do  not  appear  to  obey  any 
symmetries. 

Remark.  One  can  produce  a  classical  quadrature  scheme  on  the  square  by  simply  taking  the 
tensor  product  of  two  Gaussian  quadratures  on  an  interval.  However,  this  exactly  integrates 
basis  functions  of  the  form  xay^  with  0  <  a  <  n,  0  <  0  <  n,  rather  than  integrating 
polynomials  whose  total  degree  does  not  exceed  some  value. 

On  the  unit  disk,  an  orthogonal  basis  of  polynomials  is  given  in  polar  coordinates  by  the 
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n 

k 

a 

Koo  (F*W) 

0 

1 

0.00000 

1.00000e+0 

1 

3 

0.67617 

3.04857e+0 

2 

6 

0.79868 

5.50559e+0 

3 

10 

0.89712 

1.01509e+l 

4 

15 

0.94133 

1.59179e+l 

5 

21 

0.97804 

2.24193e+l 

6 

28 

1.00337 

3.94055e+l 

7 

36 

1.02908 

5.60579e+l 

8 

45 

1.07413 

6.75064e+l 

Table  4:  Numerical  results  for  fc-point  bilinear  quadratures  on  Pn  x  Pn  for  L 2  on  the  unit  disk. 


Figure  3:  Evaluation  points  for  bilinear  quadratures  on  Pn  x  P„  for  L2  on  the  unit  disk,  for  n  =  4,6. 


Zernike  polynomials  Zm^n{r^  9),  defined  by 

$)  ■ —  Qm.,n  (f)  COs(vTZ^)  ,  Z—  rn:n{T'i  - —  Qm,n  (f* )  sin(mfl) , 


where  n  >  m  >  0  are  integers  and  n  —  m  is  even.  Table  4  shows  the  minimized  leading 
singular  value  a  and  matrix  condition  number  for  several  fc-point  bilinear  quadratures 
on  the  unit  disk. 

3.6  Bilinear  quadrature  for  the  Sobolev  inner  product 

In  this  section  we  compute  bilinear  quadratures  that  evaluate  the  Sobolev  inner  product 

(f,g)Hi=  [  Df(x)  ■  A(x)Dg(x)  +  f(x)g{x)dx, 

Jn 

where  A(x)  is  symmetric  positive  definite  on  f l.  One  advantage  of  a  bilinear  quadrature  for 
H 1  is  that  the  above  integral  can  be  numerically  evaluated  using  only  point  evaluations  of 
/,  g  and  does  not  require  evaluating  any  derivatives. 

For  =  [ — 1,1],  bilinear  quadratures  for  H 1  on  Pra  x  P„  and  minimal  on  P„+i  l~lP^  were 
computed  for  two  positive  weight  functions  A(x)  =  l+x2  and  A( x)  =  ex .  Orthogonalization 
was  performed  by  starting  with  the  Legendre  polynomials  and  computing  the  Gram  matrix 
M  using  a  40-point  classical  Gaussian  quadrature. 
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n 

k 

G 

Koo(F*W) 

i 

2 

0.00000 

5.00000e+0 

2 

3 

0.00000 

1.38132e+l 

3 

4 

0.00000 

2.72011e+l 

4 

5 

0.00000 

6.59254e+l 

5 

6 

0.00000 

1.21461e+2 

6 

7 

0.00000 

1.86818e+2 

7 

8 

0.00000 

2.86549e+2 

8 

9 

0.00000 

4.22824e+2 

Table  5:  Numerical  results  for  bilinear  quadratures 

onP„xP„  for  H 

A(x)  =  1  +  x2. 

n 

k 

G 

Koo(F*W) 

i 

2 

0.00000 

4.52560e+0 

2 

3 

0.00000 

1.30446e+l 

3 

4 

0.00000 

2.49183e+l 

4 

5 

0.00000 

5.13338e+l 

5 

6 

0.00000 

9.28987e+l 

6 

7 

0.00000 

1.50063e+2 

7 

8 

0.00000 

2.28284e+2 

8 

9 

0.00000 

3.30651e+2 

1, 1]  with  the  weight  function 


Table  6:  Numerical  results  for  bilinear  quadratures  on  Pn  xPn  for  A/1  [ —  1 , 1]  with  the  weight  function 
A(x)  =  ex. 


In  Tables  5  and  6  the  singular  value  a  and  condition  number  Koo  are  shown  for  the 
two  bilinear  quadratures  for  H1.  In  all  cases,  cr  is  zero  up  to  machine  precision,  since  the 
exact  solution  to  the  minimization  (2.6)  is  the  roots  of  the  ( n  +  l)th-degree  ff  ^orthogonal 
polynomial,  just  as  for  Gaussian  quadratures.  We  observe  that  the  condition  number  of  the 
approximation  projection  matrix  F*W  is  larger  than  in  the  L 2  case.  This  can  be  explained 
by  the  fact  that  small  perturbations  in  the  function  values  can  lead  to  large  perturbations 
in  the  derivatives. 


4  Conclusions 

A  quadrature  framework  for  numerically  evaluating  a  continuous  bilinear  form  on  function 
spaces  has  been  presented,  and  an  optimization  procedure  for  computing  such  quadratures 
has  been  outlined.  We  have  argued  that  this  is  the  correct  approach  to  numerically  evalu¬ 
ating  orthogonal  projections  of  functions  onto  a  fixed  subspace. 

We  have  also  observed  that  the  optimization  approach  for  finding  bilinear  quadratures 
does  not  depend  on  the  ambient  dimension,  the  domain  of  integration,  or  the  function  space 
to  be  integrated  exactly.  Despite  this  generality,  in  our  numerical  experiments  we  found  the 
resulting  quadratures  perform  well,  achieving  both  efficiency  and  accuracy. 

There  are  several  topics  to  explore  in  future  work.  One  is  the  construction  and  utiliza¬ 
tion  of  bilinear  quadratures  tailored  to  specific  high-order  Galerkin  methods.  Another  is  the 
investigation  of  the  performance  of  bilinear  quadratures  for  evaluating  other  (non-Sobolev) 
bilinear  forms.  Yet  another  finding  an  efficient  numerical  method  for  solving  the  optimiza¬ 
tion  problem  (2.15)  for  the  non-invertible  case.  In  that  case,  a  bilinear  quadrature  is  not 
uniquely  determined  by  its  evaluation  points,  and  the  optimization  problem  gains  many 
additional  degrees  of  freedom.  Lastly,  one  could  investigate  the  use  of  bilinear  quadratures 
for  solving  integral  equations.  Such  quadratures  may  prove  useful  in  the  Nystrom  discretiza- 
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tion  of  Fredholm  integral  operators  [Bol72]  or  boundary  integral  equations  on  domains  with 
corners  [BRS10]. 
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